• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Nintendo NX rumored to use Nvidia's Pascal GPU architecture

Status
Not open for further replies.
No, more power efficient means they "could" decide to clock it higher (because less power = less heat). So, it's either 40% more powerful at the same power draw, or it's equally powerful at 60% less power draw... if i'm understanding it correctly. They could of course chose to go for something a bit more powerful (+/-20%) while still drawing (+/-30%) less power.

But considering this is also for a handheld, i'm pretty sure Nintendo will go for the 60% less power draw.

Yup, it's either one of the other. Or maybe something in between.
 

MuchoMalo

Banned
So, basically, we'll get TX1 performance at 60% less power. Wich is what thought to begin with. An added 40% performance (or speed) bump would be too good to be true.

Even X1 performance is still pushing it, honestly. It would be likely to be a bit faster than X1 if there were density improvements to make adding additional SMs possible which would actually make it possible to exceed that 40/60%, but since that's not the case it's not too likely.

However, I did do some calculations and came to the conclusion that it's possible to have an SoC with 3 SMs and 8 A72 cores at under 100mm2, so... :/ (which also means that the X1 may not be as big as previously said in this thread)
 

10k

Banned
Even X1 performance is still pushing it, honestly. It would be likely to be a bit faster than X1 if there were density improvements to make adding additional SMs possible which would actually make it possible to exceed that 40/60%, but since that's not the case it's not too likely.

However, I did do some calculations and came to the conclusion that it's possible to have an SoC with 3 SMs and 8 A72 cores at under 100mm2, so... :/ (which also means that the X1 may not be as big as previously said in this thread)
Which would be beastly. Even 4 A72 and 4 A53 would smack jaguar.
 

MuchoMalo

Banned
That's ridiculous. The Wii doesn't use shaders. It uses TEV, a much more primitive version of it. They're not even compatible.

I stand corrected. I don't remember every technical detail of every GPU ever, sadly. :p Either way, it was something Nintendo put on the GPU which obviously ate some die space.

Which would be beastly. Even 4 A72 and 4 A53 would smack jaguar.

I'm pretty sure that Nintendo would want games running off of only one type of core, so 4x4 wouldn't make sense. Either all are the same, 4x2, or 6x2.
 
Do we even have confirmation on the fix function shaders? Ever since the Wii we don't have any concrete info on what the systems are capable of. I don't even know what the polygon count is for the Wii U.

Pretty sure a shrunken down bare bones version of the the fixed function hardware on Wii was included as a discreet block on Wii U's GPU. I think I even identified which block it was back in the day. But it's only used in Wii mode.

There are also fixed function hardware interpolators on the GPU, but that was common to all GPUs in the R700 line.
 

Zil33184

Member
Edit 2: Referencing this article http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview/3

They were testing the X1 and claim the GPU was running at 1.5W for their tests using "Manhattan"?

So if we just took a 40% reduction in that as if it mean nothing.

Then it would go down to 0.9W?

Of course this is just guess work so dispute it.

The article doesn't specify the TX1's clock speed for that test, only that it was underclocked.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
That's ridiculous. The Wii doesn't use shaders. It uses TEV, a much more primitive version of it. They're not even compatible.
Actually TEV is approx. functionally-equivalent to SM 1.4. Just saying.
 

Thraktor

Member
This doesn't really change anything because this is simply physics at work. The issues are cost and the size of the chip.

The size of the die isn't actually that big of a deal, even something in the 200mm² range could be accommodated on a typical Nintendo handheld mainboard, and larger dies actually dissipate heat more efficiently. Cost is an issue, but you're still only looking at around $5 extra per SM. That's not a trivial amount when you're looking at a handheld BoM, but it's also not impossible that Nintendo may be able to find room for it by reducing costs in other areas (e.g. the display).

Sure, but it would be a much larger jump from TX1 than me (most of us) were expecting. That would theoretically mean it might push around 700Gflops while drawing less than half the power? This could be huge especially for "handheld" mode (if there even is a dock mode).

Maybe Thraktor or equivalent nerd
<3 Thraktor
can see how close to full speed the TX2 could get in handheld mode if this is true? It might explain the lack of different 'modes' (handheld/dock)? And might explain if (!) the TX1 in the devkits was actually overclocked.

My equivalent nerd is on sabbatical, but in theory you can get a 40% increase in clock speed and a 60% reduction in power consumption per unit performance at the same time (see Pascal desktop GPUs, which are close to that), although it's simpler to think about it as a 60% reduction in power consumption at the same clocks.

Don't expect 700 Gflops from the NX GPU in handheld mode (well, not FP32, anyway). I put together some estimates for what might be feasible in a handheld at 16nm, which you can see below:

1x SM:

1000 mW - 780 MHz - 200 Gflops FP32 - 400 Gflops FP16
1500 mW - 915 MHz - 234 Gflops FP32 - 468 Gflops FP16
2000 mW - 1025 MHz - 262 Gflops FP32 - 525 Gflops FP16

2x SM:

1000 mW - 595 MHz - 305 Gflops FP32 - 609 Gflops FP16
1500 mW - 700 MHz - 358 Gflops FP32 - 717 Gflops FP16
2000 mW - 780 MHz - 400 Gflops FP32 - 800 Gflops FP16

3x SM:

1000 mW - 510 MHz - 392 Gflops FP32 - 783 Gflops FP16
1500 mW - 600 MHz - 461 Gflops FP32 - 922 Gflops FP16
2000 mW - 670 MHz - 515 Gflops FP32 - 1030 Gflops FP16

4x SM:

1000 mW - 455 MHz - 466 Gflops FP32 - 932 Gflops FP16
1500 mW - 535 MHz - 548 Gflops FP32 - 1096 Gflops FP16
2000 mW - 595 MHz - 609 Gflops FP32 - 1219 Gflops FP16

(The power consumption figures above are for the GPU alone, not the full SoC)

I would expect 2 SMs as the likely scenario, with 3 SMs as an outside chance. I don't think 4 SMs is very likely at all, but I included it in there for completeness.

Regarding power consumption, I would expect a ~2W draw for the entire SoC to be most likely (meaning 1-1.5W for the GPU). It might be pushed up as high as 3W, if there's ample battery and the case can efficiently dissipate heat (i.e. is made of aluminium), which would push the GPU power draw up to perhaps 2W or thereabouts. The practical limit for a passively cooled SoC like this is probably around 5W (for a continuous high workload like a videogame), but that would result in a pretty hot outer casing, so I can't imagine Nintendo going that high (even ignoring the effect on battery life).

If there's a separate "docked mode", which I've yet to be convinced of, then the power draw depends entirely on whatever cooling method they have in place, but in theory there's nothing stopping them clocking the GPU up to 1.4GHz or so, as long as they can adequately dissipate 20W or so of heat.
 
Pretty sure a shrunken down bare bones version of the the fixed function hardware on Wii was included as a discreet block on Wii U's GPU. I think I even identified which block it was back in the day. But it's only used in Wii mode.

There are also fixed function hardware interpolators on the GPU, but that was common to all GPUs in the R700 line.

Well, that really wouldn't be a factor in the Wii U's power if it's only used in Wii mode.

Actually TEV is approx. functionally-equivalent to SM 1.4. Just saying.

Yes, but from what I understand, the instructions for more normal shader models can't work on the Wii. That's why so many games looked bad, because they barely even took advantage of some of the more complex abilities of the TEV.
 

ggx2ac

Member
The article doesn't specify the TX1's clock speed for that test, only that it was underclocked.

Thanks, I was in a rush to look for a reference that I couldn't see the clock speed when I was glancing. I mistakenly did a 40% reduction instead of 60% although that doesn't really matter now.
 

ozfunghi

Member
My equivalent nerd is on sabbatical, but in theory you can get a 40% increase in clock speed and a 60% reduction in power consumption per unit performance at the same time (see Pascal desktop GPUs, which are close to that), although it's simpler to think about it as a 60% reduction in power consumption at the same clocks.

So that 40+60 doesn't mean what we (mortals) think it means, and we should think of it as 60% less power draw for the same performance... right?

(The power consumption figures above are for the GPU alone, not the full SoC)

I would expect 2 SMs as the likely scenario, with 3 SMs as an outside chance. I don't think 4 SMs is very likely at all, but I included it in there for completeness.

Regarding power consumption, I would expect a ~2W draw for the entire SoC to be most likely (meaning 1-1.5W for the GPU). It might be pushed up as high as 3W, if there's ample battery and the case can efficiently dissipate heat (i.e. is made of aluminium), which would push the GPU power draw up to perhaps 2W or thereabouts. The practical limit for a passively cooled SoC like this is probably around 5W (for a continuous high workload like a videogame), but that would result in a pretty hot outer casing, so I can't imagine Nintendo going that high (even ignoring the effect on battery life).

Are these numbers based on what Pascal/Parker/X2 would produce, or based on X1?
 

heidern

Junior Member
Don't expect 700 Gflops from the NX GPU in handheld mode (well, not FP32, anyway). I put together some estimates for what might be feasible in a handheld at 16nm, which you can see below:

I assume your numbers are for a handheld sized device. If NX was a tablet what do you think would be possible then?

Also, I was wondering, what are the disadvantages of FP16 compared to FP32?
 
So that 40+60 doesn't mean what we (mortals) think it means, and we should think of it as 60% less power draw for the same performance... right?



Are these numbers based on what Pascal/Parker/X2 would produce, or based on X1?

He says that those numbers are based on a 16nm process which means Pascal, not a stock TX1.

As for 60% less power consumption and 40% increased clock speed I'm certainly confused now too. The way I originally read it was you can get 60% better (less) power consumption for the same clock speed, or 40% increased clock speed for the same power consumption, but apparently it's not that simple (nothing is).
 

Eradicate

Member
Nooooo stop!

You're giving the dock people some ammo!

"It's the dock! The dock I tells you! The dock will give NX all the power!"

/jk

Lp5WleE.gif
 
Well, i do hope NX will age better because... well, google's your friend.

Not in this case :(


Anywho, since I think the dock is the most interesting part of this whole NX leak, would any developers here know if this scenario is possible:

Nintendo has handed out devkits to third parties and given them the basic information, but is saving some info for the reveal. Developers can now start developing/porting games for the specifications provided, but Nintendo is actually waiting for the reveal to tell them (and everyone) that the final dock will have a cooling solution which allows the handheld to upclock to, say, 1.4 GHz to get the specs a lot closer to XB1 when docked.

Presumably a September reveal gives them enough time to get their games optimized for both power levels by the March launch, right? Implementing a simple fan in the dock or handheld (which only runs when docked) shouldn't cost much extra, right? Much less so than adding a second GPU to the dock I would think...

It's just so odd to me that they have this opportunity to increase performance for very little cost, yet we have heard nothing about that. The only downside I can see here is slightly higher failure rates (fan, higher clocks) and asking developers to optimize for two discrete power levels, but since the latter is being done in this industry anyway I don't see how it can be that big of a downside.
 

atbigelow

Member
Not in this case :(


Anywho, since I think the dock is the most interesting part of this whole NX leak, would any developers here know if this scenario is possible:

Nintendo has handed out devkits to third parties and given them the basic information, but is saving some info for the reveal. Developers can now start developing/porting games for the specifications provided, but Nintendo is actually waiting for the reveal to tell them (and everyone) that the final dock will have a cooling solution which allows the handheld to upclock to, say, 1.4 GHz to get the specs a lot closer to XB1 when docked.

Presumably a September reveal gives them enough time to get their games optimized for both power levels by the March launch, right? Implementing a simple fan in the dock or handheld (which only runs when docked) shouldn't cost much extra, right? Much less so than adding a second GPU to the dock I would think...

It's just so odd to me that they have this opportunity to increase performance for very little cost, yet we have heard nothing about that. The only downside I can see here is slightly higher failure rates (fan, higher clocks) and asking developers to optimize for two discrete power levels, but since the latter is being done in this industry anyway I don't see how it can be that big of a downside.

They're not going to hide crucial technical aspects like that from developers. People need to know certain things before they can make games.
 
They're not going to hide crucial technical aspects like that from developers. People need to know certain things before they can make games.

It might not be "hiding" so much as "finalizing" though your point definitely stands. I guess we shouldn't really expect this variable clock function if no one has heard about it. Then we'll be pleasantly surprised if it happens!
 

ozfunghi

Member
OH GOD, why did I search for current Kelly LeBrock x_x

To stick to 1980's movies, this quote comes to mind: The light that burns twice as bright burns half as long.

Anywho, since I think the dock is the most interesting part of this whole NX leak, would any developers here know if this scenario is possible:

Nintendo has handed out devkits to third parties and given them the basic information, but is saving some info for the reveal. Developers can now start developing/porting games for the specifications provided, but Nintendo is actually waiting for the reveal to tell them (and everyone) that the final dock will have a cooling solution which allows the handheld to upclock to, say, 1.4 GHz to get the specs a lot closer to XB1 when docked.

Presumably a September reveal gives them enough time to get their games optimized for both power levels by the March launch, right? Implementing a simple fan in the dock or handheld (which only runs when docked) shouldn't cost much extra, right? Much less so than adding a second GPU to the dock I would think...

It's just so odd to me that they have this opportunity to increase performance for very little cost, yet we have heard nothing about that. The only downside I can see here is slightly higher failure rates (fan, higher clocks) and asking developers to optimize for two discrete power levels, but since the latter is being done in this industry anyway I don't see how it can be that big of a downside.

First of all, i don't think that would work, because devs would dismiss the platform even more easily if they are not aware of the true potential like in your scenario (I mean, developing for a 700GF device must appeal to more devs, than for a 250GF device, for instance). Devs that were on board would also be rather pissed, finding out after the facts, that they could have targetted a different performance range. Basically, it would turn off devs (not aware of the true potential), it would piss off the other devs, it would be bad for basically everybody. Extra/double work for the devs that are on board, turning away devs, less software support... bad for business, bad for support, bad for sales.
 

Mr Swine

Banned
The bolded isn't true. In fact, pretty much the opposite is true, as in a tightly thermally constrained environment (i.e. a handheld) the marginal benefit to increased parallelism (i.e. more SMs) can be quite large.

To demonstrate, let's look at a power curve for Pascal I've put together. Unlike my previous power curves for A72 and A53 CPU clusters (which are based on solid real-world data from Anandtech and should be considered reasonably accurate), this is a much more rough approximation based on just four data points:

- TSMC's claims of "40% higher speed" and "60% power saving" over 20nm, each applied separately to the TX1's GPU drawing 1.5W at 500MHz (divided by 2 for 750mW per SM).
- Power draw readings from the GTX1080 before and after overclocking (full board power readings, minus GDDR5X, divided by number of SMs).

Obviously I'm extrapolating a lot from fairly poor data, but hopefully it should be in the right ballpark, and enough for our discussion in any case. (I should also note that this isn't strictly a measure of power draw for the SMs themselves, but rather a measure of the draw of an entire Pascal GPU "per SM", so including other components like ROPs, TMUs, etc., assuming they're always in roughly the same proportion to SMs). In any case, here's the power curve:

pascal_powercurve.png


The important thing to note is that, like virtually all IC power curves, it's not linear, and for a given increase in clock speed you require a much larger increase in power consumption to get you there. What this means is that you'll get better performance by using more SMs at a lower clock speed than fewer SMs at a higher clock speed.

Let's look at the clock speed (and raw floating point performance) that could be achieved with different numbers of SMs within the power constrains we might expect for a handheld GPU:

1x SM:

1000 mW - 780 MHz - 200 Gflops FP32 - 400 Gflops FP16
1500 mW - 915 MHz - 234 Gflops FP32 - 468 Gflops FP16
2000 mW - 1025 MHz - 262 Gflops FP32 - 525 Gflops FP16

2x SM:

1000 mW - 595 MHz - 305 Gflops FP32 - 609 Gflops FP16
1500 mW - 700 MHz - 358 Gflops FP32 - 717 Gflops FP16
2000 mW - 780 MHz - 400 Gflops FP32 - 800 Gflops FP16

3x SM:

1000 mW - 510 MHz - 392 Gflops FP32 - 783 Gflops FP16
1500 mW - 600 MHz - 461 Gflops FP32 - 922 Gflops FP16
2000 mW - 670 MHz - 515 Gflops FP32 - 1030 Gflops FP16

As you can see, a 3x SM configuration can achieve nearly the same performance with 1000mW that a 2x SM configuration can with twice that, and a full 50% more than a 1x SM config can manage with 2000mW at hand.

This isn't to say that I expect a 3x SM GPU in the NX, but there would certainly be a sizeable performance jump over 2x SMs if they decided to do so.

When I think about it, I wouldn't be surprised if Nintendo went with 1x SM at 780mhz just to keep the costs down and have a long battery time and still be more powerful than Wii U but with 540p resolution
 
First of all, i don't think that would work, because devs would dismiss the platform even more easily if they are not aware of the true potential like in your scenario (I mean, developing for a 700GF device must appeal to more devs, than for a 250GF device, for instance). Devs that were on board would also be rather pissed, finding out after the facts, that they could have targetted a different performance range. Basically, it would turn off devs (not aware of the true potential), it would piss off the other devs, it would be bad for basically everybody. Extra/double work for the devs that are on board, turning away devs, less software support... bad for business, bad for support, bad for sales.

I don't mean the specs would change drastically, I mean there is introduced a second target power range, like the Neo was introduced as a second power range apparently not long ago.

Would developers really be that upset if they have a second power level introduced 6 months before the system launch? I suppose this all depends on the communication from Nintendo- if they had said "the NX will have specs in X range, develop for this" and then after reveal say "you also need to have it running at Y range, sorry" that could be pretty damaging. But if it was something like "the NX may be two devices with different power levels, though we don't know yet. Here's a devkit for one NX, have fun" then it might actually be seen as a good thing for developers to have a much higher power level to target.

Also if the leaked EG devkits were primarily for developing games for the handheld screen (i.e. 540p-720p) a new docked power level designed to scale a game from, say, 720p to 1080p on a TV screen shouldn't be that big of a burden for developers.


Eh this is all speculation and rambling... it's been too long since the leak.
 

Schnozberry

Member
First of all, i don't think that would work, because devs would dismiss the platform even more easily if they are not aware of the true potential like in your scenario (I mean, developing for a 700GF device must appeal to more devs, than for a 250GF device, for instance). Devs that were on board would also be rather pissed, finding out after the facts, that they could have targetted a different performance range. Basically, it would turn off devs (not aware of the true potential), it would piss off the other devs, it would be bad for basically everybody. Extra/double work for the devs that are on board, turning away devs, less software support... bad for business, bad for support, bad for sales.

Last minute changes to hardware happen more often than you think. The PS4 was upgraded to 8GB of RAM late in the cycle at the reveal conference, and the Xbox 360 was upgraded from the expected 256GB of RAM to 512. There were also clock speed shifts for the Wii U CPU and GPU, the Nvidia GPU in the PS3, and the GPU and EDRAM in the Xbox 360.
 

MDave

Member
I wonder how many watts the SoCs use on the Vita and 3DS. Then we know what sort of max watts they want in the NX's SoC.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Yes, but from what I understand, the instructions for more normal shader models can't work on the Wii. That's why so many games looked bad, because they barely even took advantage of some of the more complex abilities of the TEV.
The one thing TEV was missing from the shader models of the time was dot3 ops. It could still compute those, but at the expense of multiple ops and thus much lower throughput. Otherwise it was pretty potent (in the context of early shader models).
 

Doctre81

Member
So, basically, we'll get TX1 performance at 60% less power. Wich is what thought to begin with. An added 40% performance (or speed) bump would be too good to be true.

I'm sure the leap from x1 to x2 will be about the same or more than the leap from k1 to x1.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Screens are about 1.5W for 5", 3W for 7".
The parallax barrier screen on the 3ds is a hog. Also, OLED can (and normally does) have higher peak consumption that LCD.
 

Theonik

Member
The parallax barrier screen on the 3ds is a hog. Also, OLED can (and normally does) have higher peak consumption that LCD.
OLED consumption also depends on how bright the colours are on the screen at any given time though.
 

Theonik

Member
It's possible that the 3DS's SoC actually uses less than 2W... Oh dear. :(
3D screens, especially passive/auto-stereoscopic ones like the 3DS have the issue that not only do they require twice the amount of pixels than they normally would to cram the second eye image, they waste a lot of brightness on the filter used for the 3D effect so need to be much brighter to compensate so burn tons of power in addition to being more expensive.
 

MuchoMalo

Banned
Quite likely.

3D screens, especially passive/auto-stereoscopic ones like the 3DS have the issue that not only do they require twice the amount of pixels than they normally would to cram the second eye image, they waste a lot of brightness on the filter used for the 3D effect so need to be much brighter to compensate so burn tons of power in addition to being more expensive.

So, even 2W might be a little optimistic for NX. Definitely not looking good.
 

MuchoMalo

Banned
That's a weird conclusion to jump to considering the lack of 3D screen(and dual screen) means they could have a beefier SoC and retain a similar power envelope.

Hence "might." Also, I'm very confident in a 720p screen at 6-8". I can only hope that, if I'm right, Nintendo decides to throw more battery life at the problem before gimping the hardware. A 4000 mAh battery can't be all that expensive, can it?
 

Oregano

Member
Hence "might." Also, I'm very confident in a 720p screen at 6-8". I can only hope that, if I'm right, Nintendo decides to throw more battery life at the problem before gimping the hardware. A 4000 mAh battery can't be all that expensive, can it?

The problem is you say might and then say "definitely not looking good" but its not looking anything because that's a scenario you thought up. They might decide that decide to make people use AA batteries to power the device, that's not sounding good, right?
 

Rodin

Member
So, even 2W might be a little optimistic for NX. Definitely not looking good.

3DS SoC drawing <2W doesn't automatically mean that NX will do the same thing, especially if the form factor is as large as you're suggesting. It's a rather different product.
 

MuchoMalo

Banned
3DS SoC drawing <2W doesn't automatically mean that NX will do the same thing, especially if the form factor is as large as you're suggesting. It's a rather different product.

It'll largely depend on battery size, but I can see Nintendo cutting corners there.
 

ggx2ac

Member
It'll largely depend on battery size, but I can see Nintendo cutting corners there.

So what you're saying is, after the hardware divisions merged. They got together at a meeting with the new manager seeing that Genyo Takeda is not taking a hands-on role as a 'fellow'. During the meeting this new manager says,"Hey, that Wii U was pretty good ain't it? Let's try to do that again, but in handheld form."

And because I'm not an expert at Japanese culture. The rest of the workers at the meeting just looked at each other and then uttered, "Hai!" as to avoid being shamed.
 

Rodin

Member
It'll largely depend on battery size, but I can see Nintendo cutting corners there.

The Samsung Galaxy S7 3000mAh battery is 3.65$. Sadly i can't find the price for the 3600mAh battery used in the S7 Edge, but i don't see it being twice as expensive. The only way that they skimp on battery is to avoid adding too much weight, which would make the device uncomfortable, but cost shouldn't be a major issue.

Might be kinda relevant that the 4GB LPDDR4 RAM are 25$ and the Snapdragon 820 SoC is the most expensive component (62$, although that should include some phone-related components that the NX doesn't need). The QHD 5.1" screen is 55$.

These numbers can obviously vary, but they should give us an idea of how much a certain component can cost (or at least the ballpark, except for the screen which will be 720p at most).
 
Status
Not open for further replies.
Top Bottom