• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

DF: Nintendo NX Powered By Nvidia Tegra! Initial Spec Analysis

Status
Not open for further replies.

StereoVsn

Member
While the market for office PCs is on a decline, NV's PC GPU revenue is actually increasing. (And not insignificantly so -- e.g. by 17% year-over-year for gaming GPUs in the last reported quarter which was actually before the introduction of Pascal)
And HPC market is growing fast and carries fat margins. Throw in VDI market and its expansion into cloud and Nvidia being the only provider there.
 

StereoVsn

Member
Long-term automotive and deep learning are hanging off a cliff, soon to follow the way of the bitcoin mining, and NV are well aware of that. Google have long announced they're moving to in-house ASICs.
Yeah HPC / server market is disappearing because Google plans to use custom ASICs... Wait, no, it's growing with or without Google. A lot of software utilizes CUDA, a lot of cloud vendors (like Azure and AWS) utilize Nvidia. A lot of corp/university HPC deployments utilize Nvidia. Their commercial side is growing and growing fast.

While utilizing custom ASICs or quantum computing (another thing Google is using) is something that is starting to be utilized for certain tasks that leaves a huge chunk of he market unable or unwilling or outpriced by such.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
While the market for office PCs is on a decline, NV's PC GPU revenue is actually increasing. (And not insignificantly so -- e.g. by 17% year-over-year for gaming GPUs in the last reported quarter which was actually before the introduction of Pascal)
That's good. I didn't check their last quarter so I was not aware they had that much of an increase. Though, as good as that sounds, they're facing there the classic problem - once you have a market to yourself, and you've cranked the margins to 11, what else can you do the get your next increase?
 
That's good. I didn't check their last quarter so I was not aware they had that much of an increase. Though, as good as that sounds, they're facing there the classic problem - once you have a market to yourself, and you've crunched the margins to 11, what else can you do the get your next increase?

Founder's Edition
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Yeah HPC / server market is disappearing because Google plans to use custom ASICs... Wait, no, it's growing with or without Google. A lot of software utilizes CUDA, a lot of cloud vendors (like Azure and AWS) utilize Nvidia. A lot of corp/university HPC deployments utilize Nvidia. Their commercial side is growing and growing fast.

While utilizing custom ASICs or quantum computing (another thing Google is using) is something that is starting to be utilized for certain tasks that leaves a huge chunk of he market unable or unwilling or outpriced by such.
All those 'new markets' pushes will be good as long as GPGPU is useful for them. Now, while general-purpose programmability might be an inherent requirement for some of them, it definitely is not for others. Again, for reference see the bitcoining which started off massively on GPGPU but in the course of a few years ended up on ASICs which were neither more expensive nor less productive than even the best-suited GPUs. In contrast HPC is firmly GPGPU territory, as general programmability will always be valued there.
 

dr_rus

Member
Long-term automotive and deep learning are hanging off a cliff, soon to follow the way of the bitcoin mining, and NV are well aware of that. Google have long announced they're moving to in-house ASICs. And so will do automotive - again, they're way more margin-sensitive than mobiles - it makes zero sense for them to stick to GPGPU in the long run.
Google's ASICs will work with INT8 precision only. It's not as simple as you think to build a good stream processor. Otherwise everyone would have their own GPU on the market tomorrow.

It's their Tegra division that desperately needs to catch traction. Unless you believe it's a free hobby of sorts.

ed: oops.

For the third time: their Tegra division is fine, the traction has been caught some years ago.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Google's ASICs will work with INT8 precision only. It's not as simple as you think to build a good stream processor. Otherwise everyone would have their own GPU on the market tomorrow.
I never said anything about how simple building stream processors was. Google spent a good amount of resources on their designs.

For the third time: their Tegra division is fine, the traction has been caught some years ago.
Well, let's put things into perspective, shall we:

tegra-revenue_large.png

Tegra3 was the last Tegra that scored with mobile. Mind you, it wasn't a great success, just had some wins - T4 was a fiasco in comparison. Notice how their current Tegra revenue has not reached their 2013 levels? So at the end of the day, NV left mobile for greener pastures, only those were not as green. What is worse, automotive will abandon them at the drop of a hat for autonomous car purposes, once the tools of the trade settle down. That would leave Tegras in automotive infotainment alone - as fickle a business as it gets.
 

ethomaz

Banned
Tegra reveal is tomorrow.

Let's see what nVidia has for NX.

BTW rumors says Tegra Pascal reaches 1.25TFs... it is from nVidia PR.
 

MuchoMalo

Banned
Slides are out: Nvidia Tegra „Parker“ : Neue Denver-2-ARM-Kerne treffen Pascal-Grafik in 16 nm

It's very much a Tegra for automotive industry. GPU is essentially the same (well, with Pascal additions obviously, but the same SP count), biggest changes are in CPU and doubling of memory bandwidth.

I think it's pretty safe to assume that Nintendo would be using something custom based on Parker design instead of using Parker itself.

Only 256 CUDA cores? That's not very encouraging.

half-precision?

Most likely. The GPU would need to be over 2GHz otherwise.
 

KingSnake

The Birthday Skeleton
Slides are out: Nvidia Tegra „Parker“ : Neue Denver-2-ARM-Kerne treffen Pascal-Grafik in 16 nm

It's very much a Tegra for automotive industry. GPU is essentially the same (well, with Pascal additions obviously, but the same SP count), biggest changes are in CPU and doubling of memory bandwidth.

I think it's pretty safe to assume that Nintendo would be using something custom based on Parker design instead of using Parker itself.

Wouldn't doubling of memory bandwidth be a very important factor for Nintendo?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.

MuchoMalo

Banned
FP32.

nVidia site says 2.5TFs for 2 Tegra Pascal chip.

I'm not sure if they count the ARM processor flops together.


http://www.nvidia.com/object/drive-px.html

It doesn't say which it is. There's no chance in hell that the GPU is running at 2.5GHz though, so the logical conclusion is FP16.

For idiots like me: what is the gaming performance of that chip?

Nobody knows. It's not made for gaming, and there are some uncertainties with it.
 

Genio88

Member
You are giving for granted that NX will use this latest Tegra chip, to me it's more likely they'll be using the older one, NX needs to be cheap, and rumors are indeed that it'll be cheap, i guess Nintendo will take advantage of the release of this new Tegra chip so that the older one will drop its price, perhaps a NX revision in 2 years from now could use the new Tegra.
After all that's what console always do, for example PS4 and Xbox One on their launch in 2013 used AMD custom VGA based on 2011 graphics cards
 

ethomaz

Banned
It doesn't say which it is. There's no chance in hell that the GPU is running at 2.5GHz though, so the logical conclusion is FP16.
This is speculation from an update in old article in AnandTech.

NVIDIA quoted this and not FP16 FLOPS, so it may include a special case operation (ala the Fused Multiply-Add), or even including the performance of the Denver CPU cores.
I believe it is including ARM flops.

In any case the total system with 2x dGPU is up to 8TFs with 2x Tegra counting for 2.5TFs... all FP32.
 

KingSnake

The Birthday Skeleton
NX will launch somewhere in 2017. So Tegra Parker will also be "old" by then. Plus it would justify the delay of NX for next year. Not that it will use an off the shelf Parker, but the custom chip NX will use can benefit from few things from Parker, like 16nm and double memory bandwidth would be the obvious choices.
 

ethomaz

Banned
You are giving for granted that NX will use this latest Tegra chip, to me it's more likely they'll be using the older one, NX needs to be cheap, and rumors are indeed that it'll be cheap, i guess Nintendo will take advantage of the release of this new Tegra chip so that the older one will drop its price, perhaps a NX revision in 2 years from now could use the new Tegra.
After all that's what console always do, for example PS4 and Xbox One on their launch in 2013 used AMD custom VGA based on 2011 graphics cards
This new Tegra chip is cheap than old one... it is way small... it is the same 256SPs in 16nm.

Small = cheaper in silicon terms and for that older chips trend to be more expensive than new ones.
 
Has Nintendo ever used a bog standard GPU solution?

Whatever the NX has it's almost certainly custom designed, so the core count of this chip wouldn't matter, right? They typically license the architecture and help design their own custom silicon for their consoles. Or am my mistaken?
 

ggx2ac

Member
I posted this elsewhere, can anyone comment? Maybe? It should be image 5 of 9

In one of the slides, there's a graph showing relative CPU performance of the Parker's 2 Denver Cores + 4 A57s compared to others and I noticed one of them is a Huawei with 4 A72 + 4 A53 cores. Does this help with anything regarding performance if say Nintendo went with the 2nd gen Denver Cores?
 

MuchoMalo

Banned
This is speculation from an update in old article in AnandTech.


I believe it is including ARM flops.

In any case the total system with 2x dGPU is up to 8TFs with 2x Tegra counting for 2.5TFs... all FP32.

We need more information. You need a very special chip and LN2 to hit 2.5GHz. Including the CPU wouldn't explain it either.
 

Thraktor

Member
Only 256 CUDA cores? That's not very encouraging.

It's not really surprising for an automotive SoC designed to be used in conjunction with a dedicated GPU. There's not much point going crazy on the integrated GPU at that point.

I posted this elsewhere, can anyone comment? Maybe? It should be image 5 of 9

As previously discussed, I wouldn't read too much into benchmarks for Denver, the microarchitecture (or more specifically the use of dynamic code optimisation) is basically built for benchmarks, and how it handles real-world code could be a different matter entirely.

Interestingly, slide 6 actually provides a fascinating little nugget of insight into this:

  • Schedule the task on the right CPU core
  • Move task/thread to the right core as compute needs change
  • Maximise peak performance and responsiveness

This seems to be a confirmation of something I speculated a few months ago, that Nvidia's decision to combine Denver and A57 cores on Parker is a recognition that Denver has some fairly significant performance weak spots, and they're offloading threads which perform poorly on Denver to the A57 cores. I'd be interested to know what kind of heuristics they're using to allocate threads, although I very much doubt that's something they'd ever talk about.
 

ggx2ac

Member
As previously discussed, I wouldn't read too much into benchmarks for Denver, the microarchitecture (or more specifically the use of dynamic code optimisation) is basically built for benchmarks, and how it handles real-world code could be a different matter entirely.

Interestingly, slide 6 actually provides a fascinating little nugget of insight into this:



This seems to be a confirmation of something I speculated a few months ago, that Nvidia's decision to combine Denver and A57 cores on Parker is a recognition that Denver has some fairly significant performance weak spots, and they're offloading threads which perform poorly on Denver to the A57 cores. I'd be interested to know what kind of heuristics they're using to allocate threads, although I very much doubt that's something they'd ever talk about.

Yes, Blu pointed out in the Eurogamer NX thread how the HMP it is using is important and noted it is 'proprietary coherent interconnect' from slide 4.
 

dr_rus

Member
FP32.

nVidia site says 2.5TFs for 2 Tegra Pascal chip.

I'm not sure if they count the ARM processor flops together.


http://www.nvidia.com/object/drive-px.html

That'd mean 2.44GHz for FP32 which is basically impossible without active cooling and in mobile power envelope. 1.22GHz with FP16 being the basis for this number OTOH sounds completely possible on 16FF+ and a Pascal shader core. So we're looking at ~624 GFlops of FP32 and 1.25 TFlops of FP16.

Then again it's rather unlikely that Nintendo will use Parker without any customizations as lots of Parker features make sense only in automotive applications.

As previously discussed, I wouldn't read too much into benchmarks for Denver, the microarchitecture (or more specifically the use of dynamic code optimisation) is basically built for benchmarks, and how it handles real-world code could be a different matter entirely.

Interestingly, slide 6 actually provides a fascinating little nugget of insight into this:



This seems to be a confirmation of something I speculated a few months ago, that Nvidia's decision to combine Denver and A57 cores on Parker is a recognition that Denver has some fairly significant performance weak spots, and they're offloading threads which perform poorly on Denver to the A57 cores. I'd be interested to know what kind of heuristics they're using to allocate threads, although I very much doubt that's something they'd ever talk about.

Not sure what you mean as it handles benchmarks in the same way it handles real world applications, there is not difference. The offloading to A57 partition is there for tasks which benefit from multithreading more than from a high single thread performance. I would assume that a thread which needs high performance processing will be mapped to Denver2 core while lighter threads will go on A57 which are worse in ILP but are likely better in perf/watt.
 

thefro

Member
I'd think Denver 2 would be overkill from a CPU standpoint, plus the slides don't mention anything about power draw, so it's likely not as power efficient as some of the other ARM options that would still match or beat PS4/XB1 in CPU performance.

Of course it's hard to say without more info about these particular Denver processors
 

MDave

Member
Nintendo could take out the Denver cores, leaving just the 4 ARM cores on there. Saves a bit of money, space, watt power. Lower sustained clocks for the CPU and GPU cores, to keep it within thermal envelopes. Say, 800Mhz. Call it custom at that point? On 16nm that should be around 3-4 watts according to my calculations. If they want a small 3DS sized console anyway. They can increase power slightly for a bigger device.
 

dr_rus

Member
I'd think Denver 2 would be overkill from a CPU standpoint, plus the slides don't mention anything about power draw, so it's likely not as power efficient as some of the other ARM options that would still match or beat PS4/XB1 in CPU performance.

Of course it's hard to say without more info about these particular Denver processors

13-630.2322721985.png


That first line sounds like Parker should consume less than TX1 while providing more CPU performance - but this is hardly surprising considering the move from 20SOC to 16FF.
 

Oregano

Member
I think if they can get peak X1 performance but portable and passively cooled I'd be pretty happy.

That's be a sizable jump up from Wii U and using mixed precision would allow for some damn nice looking games. Especially depending on what resolution they target.
 

TunaLover

Member
With the size of TVs increasing fast, 720p is starting to show its age, 1080p still seems like the sweetspot, I really want to know what resolution they will go for NX docked, I'm having hard time looking at Wii U games with my 40" TV, most of them are 720p, I need to sit about 3 meters away from the screen. In general I'm very tolerant to low resolutions, the problem is TV manufacturers are moving quickly to UHD, and those panels are not ideal to watch sub 4k content.
 

ethomaz

Banned
I found some interesting info.

nVidia page spec for Drive PX (the first one) says 2.3 TFLOPs for a dual configuration... that means they are talking about FP16 that means 575 GFLOPs FP32 for each Tegra X1 inside it.

Let's works with that way of thinking...

Drive PX FP32: 575 GFs
Drive PX FP16: 1.15 GFs

Now how about Pascal Tegra?

Drive PX2 FP32: 625 GFs?
Drive PX2 FP16: 1.25 GFs?

I don't know if nVidia add CPU flops to the math or not.
 

KingBroly

Banned
I found some interesting info.

nVidia page spec for Drive PX (the first one) says 2.3 TFLOPs for a dual configuration... that means they are talking about FP16 that means 575 GFLOPs FP32 for each Tegra X1 inside it.

Let's works with that way of thinking...

Drive PX FP32: 575 GFs
Drive PX FP16: 1.15 GFs

Now how about Pascal Tegra?

Drive PX2 FP32: 625 GFs?
Drive PX2 FP16: 1.25 GFs?

I don't know if nVidia add CPU flops to the math or not.

that feels like you have those reversed
 

Peterc

Member
What if it's just tegra x1, would it be ok? does it matters.

People likes to talk about tech, but when comparing games from 3ds vs wii/wiiu. It doesn't really matter that much. If you have a base station that can increase the power to make it equal to xbox1 it would be ok.
 

Mokujin

Member
My guess is that the chip in NX is something a lot closer to TX1 but featuring 16nm Finfet and some extra work in the memory setup, I dont expect extra cuda cores or Denver CPU cores. This is what makes most sense to me with all the available information.

But we already know that Nintendo is known to make a lot of decisions that don´t make sense, so there is that.
 
Status
Not open for further replies.
Top Bottom