• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

A Nintendo Switch has been taken apart

ordrin

Member
I don't think anyone is doing that, but going by their naming convention it's the second version of their NX customized chip

i found this quote on gamefaqs
seems a load of bs to me and i highly doubt that user works with chips at all.
but still. any boffins on here care to pick apart that reasoning?

"
http://67.227.255.239/forum/showthread.php?t=1345524

I work with chips this is what I can tell you.

The u is actually a sideways c we usually denote this as custom

D for designated
N for number

The next 3 letters numbers are the key.
We know this is a nVidia chip so we can state that an X02 is a Tegra X 2... now that's the standard form.. but because it's a custom chip you have to put a revision in.. thus comes the - a2. It's been my experience that in this case the a stands for additional, though it can be anything but... I'm about 95% sure that this in fact is:

Custom designated number Tegra X 2 - additional 2 cores."
 
BTW, AmyS, engines like Unreal 4 already efficiently uses fp16 due to it being designed with phones and tablets in mind, so some developers will already have an advantage.

Is LCGeek permanently banned? If so were they banned on that particular day because of what was shown at the event? Same goes for Nate Drake.

I don't know about LCGeek, but Nate's ban will be released in March.

can you blame us after the wii u?

That didn't work out to anyone's expectations, but this is definitely not like the Wii U.

i found this quote on gamefaqs
seems a load of bs to me and i highly doubt that user works with chips at all.
but still. any boffins on here care to pick apart that reasoning?

Yeah.. I don't know if I would trust that.
 

Hermii

Member
What did LCGEEK say that was disproven by the event?

There was the cpu comment that she later backtracked on when she learned of eurogamer clock speeds. She knew A58, but not the final clocks. How could one deduce cpu speeds from the event anyway?

Did she say anything else?

NateDrake said he heard pascal and later said he was sure of Maxwell in the final hardware. Was that why he was banned?
 

KingSnake

The Birthday Skeleton
i found this quote on gamefaqs
seems a load of bs to me and i highly doubt that user works with chips at all.
but still. any boffins on here care to pick apart that reasoning?

That sounds like pure bullshit. Unless you think the new Shield TV has 2 additional cores. Or the original Shield TV had 1 additional core. Additional compared to what?
Also there is no official X2.
 

AmyS

Member
Hey guys, I guess I over did it somewhat with that last post. I wasn't even thinking about GAF members, just the ton of speculation I've seen elsewhere of people getting their hopes too high. GAF has a far better grasp as far as expectations than just about all other discussion places put together.
 

Zedark

Member
Hey guys, I guess I over did it somewhat with that last post. I wasn't even thinking about GAF members, just the ton of speculation I've seen elsewhere of people getting their hopes too high. GAF has a far better grasp as far as expectations than just about all other discussion places put together.
It's fine. The funny thing is that this thread consists of about half people saying we must stop hyping ourselves up and stop being delusional and the other half is just "thenumbers,whatdotheymean?.gif", trying to figure out what is possibly true and what isn't. It's funny in the predictability of the thread progression, to me at least.
 

z0m3le

Banned
So if a 3rd party asked for more performance in final hardware, we might actually see a clock increase. My thinking was that before final hardware, the final battery life target was 5 to 8 hours. If they increased clocks to get us to 2.5 to 6 hours, that makes a great deal of sense to me.

Ive been thinking about the clocks tested at Foxconn, and while the gpu is reasonably fair to see an upgrade, I think the cpu clock might have been pushed a bit higher than final to test thermal limits. It might not be at high as 1.78ghz and thus it might still be A57. If they went with 1.4ghz on 16nm the power draw would be close to identical, 1.6ghz would certainly drop battery life at ~2.6watts for the cpu, 1.7ghz is 3watts which might be a bit much. If they went with a ~3 watt SoC + other components, you'd see 5 to 6 watt power draw for the system and a ~3 hour battery life. Giving you 1.6ghz A57 and 384mhz gpu when undocked.

It's just a thought, the cpu cores are done on Samsung's 14nm node for A57, so tsmc would not be identical and maybe the performance difference is enough to fit that 1.78ghz which is almost a full watt more. Thats About 4watts, if the entire rest of the switch can fit in 2.5 watts, you might actually be able to push those clocks on 16nm with A57, though like I said, they might have pushed higher clocks for testing and the cpu clock might actually just be 1.6ghz or so, it would give Switch close to 5 ps4 cpu cores and Nintendo could always free up cpu cycles on the 4th core to reach 6 core equivalent if needed.

What we know is Foxconn clock tests almost certainly happened at those numbers and that final hardware saw a noticeable performance increase.
 
What did LCGEEK say that was disproven by the event?

There was the cpu comment that she later backtracked on when she learned of eurogamer clock speeds. She knew A58, but not the final clocks. How could one deduce cpu speeds from the event anyway?

Did she say anything else?

NateDrake said he heard pascal and later said he was sure of Maxwell in the final hardware. Was that why he was banned?

Yeah I'd like to know too. I didn't know they made definitive comments that were completely proven wrong by the event...
 

z0m3le

Banned
There's no guarantee that the clocks from the test are the clocks of the retail unit.

You don't have cooling that can handle 1.78ghz and 921mhz clock speeds for 8 days straight and then turn around and clock that chip at 1ghz and 768mhz, it also wouldn't make sense where the noticeable performance increase came from.

If the chip is 16nm, it is very likely that we will see a bump in clocks over Eurogamer's. My post above was entirely about what 16nm would mean and that A57 is still likely as the CPU architecture. I'm willing to wait, but I'll continue to try and make sense of all the facts rather than ignore some.
 
They didn't deserve a ban, and if they did then how the fuck all those trolls flaming in Nintendo discussions can be free?


All of the speculators are feet firmly in the ground, this is at most a fascinating academical discussion about the possibilities with the given information (be it false or true). It would be nice to keep things like that.



I think that they upped the power of the Switch sometime last year, maybe after speaking with Capcom. Not much maybe, but enough to make the battery life less and satisfy at least some of the 3rd party devs. I mean, between July and the Foxconn leak (October, December?) there is a lot of time to tweak things.
 

KingSnake

The Birthday Skeleton
You don't have cooling that can handle 1.78ghz and 921mhz clock speeds for 8 days straight and then turn around and clock that chip at 1ghz and 768mhz, it also wouldn't make sense where the noticeable performance increase came from.

Point me where does it write that a retail unit was tested for 8 days straight at 1.78ghz and 921mhz clock speeds?
 

z0m3le

Banned
Point me where does it write that a retail unit was tested for 8 days straight at 1.78ghz and 921mhz clock speeds?

http://www.neogaf.com/forum/showpost.php?p=229879660&postcount=836 Post 5 says that one of the units was under test for 8 days, it's right in the middle of that post given in minutes.

Also, the original test with that speed that he talked about lasted over 2 hours, if it's not dropping a frame during that test (no throttling) then it should be able to run at those clocks until it burns out.
 

KingSnake

The Birthday Skeleton
http://www.neogaf.com/forum/showpost.php?p=229879660&postcount=836 Post 5 says that one of the units was under test for 8 days, it's right in the middle of that post given in minutes.

Also, the original test with that speed that he talked about lasted over 2 hours, if it's not dropping a frame during that test (no throttling) then it should be able to run at those clocks until it burns out.

You didn't answer my question. As you can see from my post at the top of the page I've recently re-read everything. That part says:

One unit has been under test for 11,750 minutes in our factory, still working well.

Nothing about how and at what clocks. Could have just been opened and in the main menu. Sounds like a reliability test anyhow.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Cuningas de Häme;230711742 said:
They didn't deserve a ban, and if they did then how the fuck all those trolls flaming in Nintendo discussions can be free?
Absolutely. Pre-production bits of info can turn invalid later down the production cycle. Changes happen. I personally believe neither LCGeek nor Nate had personal interest in deceiving anybody on these forums, and were acting in full sincerity. At the same time drive-by low-brow shit-posting is mostly left undisturbed. Go figure.

All of the speculators are feet firmly in the ground, this is at most a fascinating academical discussion about the possibilities with the given information (be it false or true). It would be nice to keep things like that.
These are my expectations from the speculation/analysis threads as well.

Strictly speaking the shader code is GLSL for Julia (not that it matters), but whoever's written the shell script has written the output as single-precision, which seems a very strange thing to do if they've also gone in and changed the code to FP16. Particularly so as they've got a separate double-precision line still there, as opposed to doing both a half-precision and single-precision run, which may make sense on an architecture like this.
Well, for all we know it could be a first run of a test script the guy copied over from his desktop after he put a 'precision mediump float' statement in the julia shader. What really puzzles me along that story is how a devkit has an actual CLI with a script-capable shell - that would be a very unexpected turn of events, and if switch devkits actually turn out running any sort of usable CLI (that from my POV includes busybox-level of functionality and some rudimentary compatibility to a well-established linux distro package system), I'll personally buy a handful of devkits for my own purposes.
 
Absolutely. Pre-production bits of info can turn invalid later down the production cycle. Changes happen. I personally believe neither LCGeek nor Nate had personal interest in deceiving anybody on these forums, and were acting in full sincerity. At the same time drive-by low-brow shit-posting is mostly left undisturbed. Go figure.
Sorry for OT, but this has bugged me since the january reveal.
I couldn't find any kind of feedback forum, where Admins would explain their current ban politics (aside from the standard rules thread), because something must have been happened, when whole threads are allowed to go down the toilet because of drive-by "lol Nintendo".
 

z0m3le

Banned
You didn't answer my question. As you can see from my post at the top of the page I've recently re-read everything. That part says:



Nothing about how and at what clocks. Could have just been opened and in the main menu. Sounds like a reliability test anyhow.

I address that in the second line, look, stop playing games. It's just nitpicking to draw a line between 2 hours and 8 days because if it's capable of running for 2 hours, it's capable of running until it burns out.

The switch isn't coming out years from now, its at the end of next week. If you have so much trouble with the foxconn leak being real, then ignore it. I don't have to change your mind.
 
Absolutely. Pre-production bits of info can turn invalid later down the production cycle. Changes happen. I personally believe neither LCGeek nor Nate had personal interest in deceiving anybody on these forums, and were acting in full sincerity. At the same time drive-by low-brow shit-posting is mostly left undisturbed. Go figure.


These are my expectations from the speculation/analysis threads as well.

raw
 

KingSnake

The Birthday Skeleton
I address that in the second line, look, stop playing games. It's just nitpicking to draw a line between 2 hours and 8 days because if it's capable of running for 2 hours, it's capable of running until it burns out.

Are you talking about this 2 hours:

The demo screen is very boring, just a bunch of fish swimming around. The factory floor is very noisy so I couldn't tell if it generates much noise. I touched it and it wasn't too warm after running for 2 hours, so it's not too bad.
Brightness really isn't very good.

or this:

The advantage of Nintendo Switch is no frequency throttling. There's no lag whatsoever after running for 2 hours straight. Not like mobile phones that can last only 1 minute on full performance.
?

Because those say nothing about clocks either.

Actually the leaker speaks nothing about clocks until post 5 where he uses the "standard specs":

Here's some standard specs: CPU 1750 MHz, GPU 912 MHz, EMC 1600 MHz.

We know the "standard specs" of X1 for example, there are not far from this.

The switch isn't coming out years from now, its at the end of next week. If you have so much trouble with the foxconn leak being real, then ignore it. I don't have to change your mind.

As this post (and many other) proves I have no issue with the foxconn leak being real. I have an issue with interpreting more than it says.
 

AzaK

Member
People just hoping against hope that the Switch custom Tegra has:

-512 GPU cuda cores (na, it's going to be 256)

-600 Gflops to 1 TF (nope, it's going to be right around ~393 Gflops (fp32) when docked and 157 Gflops (fp32) undocked as a portable
-That FP16 capability somehow doubles the performance (it won't, could be useful in some situations of rendering but that will take time to utilize, give it a few generations of software)

- Pascal GPU architecture (probably not, it's gonna be Maxwell, but there's not a whole lot of difference between Maxwell 2.0 and Pascal anyway)
- 128-bit bus for 50 GB/sec bandwidth (nope, it's going to be 64-bit and ~25 GB/sec)

- Denver2 or A72 CPU cores (it's gonna be four A57s).

- 16nm FinFET (unlikely, probably 20nm)

- Maybe even simply, somewhat higher CPU and/or GPU clock speeds from what Eurogamer leaked.

I feel that a small clock speed bump from those July dev kits is the only fairly reasonable possibility.

Finally some sense. For some weird reason people still think that when presented with 2 options, Nintendo would go for the more modern, higher end version. They won't. They will try and see if they can build an acceptable system with the lowest options.
 

z0m3le

Banned
Are you talking about this 2 hours:



or this:


?

Because those say nothing about clocks either.

Actually the leaker speaks nothing about clocks until post 5 where he uses the "standard specs":



We know the "standard specs" of X1 for example, there are not far from this.



As this post (and many other) proves I have no issue with the foxconn leak being real. I have an issue with interpreting more than it says.

And where did he see those clocks? for instance he flat out tells you the CPU clock and GPU clock in your post here, does that mean that that is the final clocks and weren't part of any test at all? because that is the assumption you are now drawing.
 
Finally some sense. For some weird reason people still think that when presented with 2 options, Nintendo would go for the more modern, higher end version. They won't. They will try and see if they can build an acceptable system with the lowest options.

Hey, blu and some others talked just about you. ;)
 

etking

Banned
Production cost may be barely over 100$, together with display, controller and everything I would guess $150 max. So they have plenty of room for future price drops because the system is way to expensive to appeal to more than hardcore early adopters.
 

valouris

Member
Production cost may be barely over 100$, together with display, controller and everything I would guess $150 max. So they have plenty of room for future price drops because the system is way to expensive to appeal to more than hardcore early adopters.

The Joy-Con should be around 50-60, with so much stuff built into such a small space, and the tablet is definitely much more than 100, I'd say whole cost should be around 250, probably more with all the accessories included (grip, joycon sliding grips, cables).
 
Production cost may be barely over 100$, together with display, controller and everything I would guess $150 max. So they have plenty of room for future price drops because the system is way to expensive to appeal to more than hardcore early adopters.

Didn't a lot of japanese developers talked about their amazement how Nintendo managed to sell the Switch at 300 dollars? They all thought it would cost more because the stuff whats inside.
 

KingSnake

The Birthday Skeleton
And where did he see those clocks? for instance he flat out tells you the CPU clock and GPU clock in your post here, does that mean that that is the final clocks and weren't part of any test at all? because that is the assumption you are now drawing.

My point was that those clocks could as well be the maximum clocks for the SoC "on paper" (specifications) or could as well be the maximum clocks in tests (if they stress tested it), but not necessarily the clocks at which a retail unit is running in normal operation mode.

I'm not drawing a definitive conclusion, I'm saying it is very much in the air what those clocks refer to and it's far from being a fact even when you assume that the Foxconn leak is real in totality (and it's already proven that some parts are true and some parts are false).
 

Hermii

Member
It would make sense that they would run a test with the fan and all the components running at full speed to see if it holds up.
 

sits

Member
And where did he see those clocks? for instance he flat out tells you the CPU clock and GPU clock in your post here, does that mean that that is the final clocks and weren't part of any test at all? because that is the assumption you are now drawing.

Let him go z0m3le, he's a bit LTTFoxconnP. I think we thoroughly dissected it in the previous Switch HW thread. You're being bated into giving him a summary for a thread he can go read for himself.
 
Agree with blu here, gamers in general treat insiders far, far too hostile for getting things wrong when there are many, many legitimate reasons why an insider will get things wrong that aren't them trying to mislead us.

And where did he see those clocks? for instance he flat out tells you the CPU clock and GPU clock in your post here, does that mean that that is the final clocks and weren't part of any test at all? because that is the assumption you are now drawing.

I think I see where KingSnake is coming from here. In that new translation there is no connection of those clock speeds to a demo like I believe there was in the original translation. So we don't know if he saw a spec sheet with that info (1.78GHz max, 921MHz max, though he also mentions 16nm in this context) or if he did see a demo with those readouts as we originally thought.
 

Thraktor

Member
You'd think with his level of knowledge and experience he would at least know they were Samsung modules if he saw the writing. It's just interesting that he explicitly mentioned he wasn't able to identify it. It could be as KingSnake said above that he just wasn't able to look closely enough at it.

Samsung could make any number of components in Switch, though, from RAM to NAND to the SoC itself, so simply being able to identify it as a Samsung chip wouldn't tell him much. He correctly identified them as RAM modules, which seems sensible from their size and positioning of the board, but any further details (i.e. capacity, speed, etc.) would have required looking up the product code, and I don't think he would have been in a position to do that.

Hmm that's interesting, so it could be that he has seen some pre-final retail units, or possibly the motherboard might differ between different retail units? I wonder if the unit in the OP has one of the older or newer motherboards. I guess only the leaker would be able to tell us that

Yeah, it would be interesting to compare it to the iFixit teardown in a couple of weeks, as any changes between the two would indicate if it's an early prototype or dev unit or something like that.

Oooh that's a really cool image, thanks! As someone who studied materials science that's a fascinating picture, it really shows the value of space and creative mask production.

And yeah I figured the other image wasn't exactly what it would look like but it is a fair representation of the various component sizes, right?

I wouldn't even take it as a rough layout. The interfaces around the sides of the chip are probably about right, but the main blocks (CPU/GPU) probably don't really bear any relation to Nvidia's mockup.

Well, for all we know it could be a first run of a test script the guy copied over from his desktop after he put a 'precision mediump float' statement in the julia shader. What really puzzles me along that story is how a devkit has an actual CLI with a script-capable shell - that would be a very unexpected turn of events, and if switch devkits actually turn out running any sort of usable CLI (that from my POV includes busybox-level of functionality and some rudimentary compatibility to a well-established linux distro package system), I'll personally buy a handful of devkits for my own purposes.

Yeah, the CLI access may be a bit odd, although as I've never interacted with console development hardware I wouldn't really know whether that's normal or not (however a full featured Linux package manager may be pushing it a little). One thing that did strike me as unusual is that they use Windows folder notation (i.e. \ ) rather than *nix notation ( / ) when running the script. Admittedly I've (as far as I can recall) only ever used *nix machines to remote access other *nix machines, and have never used Powershell, so I'm not sure what the conventions are in that respect.
 

Ninja Dom

Member
Production cost may be barely over 100$, together with display, controller and everything I would guess $150 max. So they have plenty of room for future price drops because the system is way to expensive to appeal to more than hardcore early adopters.

But then Nintendo would factor in the huge R&D costs and the marketing costs.
 

DESTROYA

Member
Tear down is great but I'm still pretty bummed about the lackluster launch titles besides Zelda, wish they had just a couple of more to justify getting it day one.
Love what the switch brings to the table but man what a let down on the software side.
 

z0m3le

Banned
Agree with blu here, gamers in general treat insiders far, far too hostile for getting things wrong when there are many, many legitimate reasons why an insider will get things wrong that aren't them trying to mislead us.



I think I see where KingSnake is coming from here. In that new translation there is no connection of those clock speeds to a demo like I believe there was in the original translation. So we don't know if he saw a spec sheet with that info (1.78GHz max, 921MHz max, though he also mentions 16nm in this context) or if he did see a demo with those readouts as we originally thought.

I see what he is doing, and it leads no where. The only things the leak got wrong were assumptions. The clocks relate the Eurogamer's clocks, before Eurogamer's clocks were a thing and fall in line with Tegra's frequency base.

The maximum clocks of the device should actually be over 2ghz for the CPU and 1ghz for the GPU, these shouldn't be max clocks, we can see the relation to X1's chip pretty strongly.

He'd only have access to these clocks through a demo or some sort of spec sheet, why would a foxconn spec sheet have clocks on it? We are also still dealing with a translation of the original wordings, so its hard to say why things changed, but to remove the clocks from a demo, makes them more set in stone then being part of a test.

The main point I have is ignoring the foxconn leak is an error when speculating on Switch's specs, I'm not saying they are the correct specs because I do believe he got it from a demo, as he wouldn't be in a position to know such information from his leaks, and him making the clocks up requires a good deal of information that is somewhat specialized and outside his possible knowledge (eurogamer's clocks)

When we are so close to the truth, it's best to keep our minds open to the possibility of solid leaks like Eurogamer's and Foxconn's, to understand the timing involved and to speculate on explain them.
 

mrklaw

MrArseFace
Tear down is great but I'm still pretty bummed about the lackluster launch titles besides Zelda, wish they had just a couple of more to justify getting it day one.
Love what the switch brings to the table but man what a let down on the software side.

for positive spin, how about the lack of games gives you a lower barrier to entry.
- Don't have the money for lots of launch games? No problem.
- Don't have the time to play lots of launch games? No problem.

Its a bit twisted but as I assume I'll buy in for Mario Odessey anyway, I might as well get it now and then deal with games one at a time, rather than suddenly having a bunch that I wont have time to play at Christmas
 
I see what he is doing, and it leads no where. The only things the leak got wrong were assumptions. The clocks relate the Eurogamer's clocks, before Eurogamer's clocks were a thing and fall in line with Tegra's chip capacity.

The maximum clocks of the device should actually be over 2ghz for the CPU and 1ghz for the GPU, these shouldn't be max clocks, we can see the relation to X1's chip pretty strongly.

He'd only have access to these clocks through a demo or some sort of spec sheet, why would a foxconn spec sheet have clocks on it? We are also still dealing with a translation of the original wordings, so its hard to say why things changed, but to remove the clocks from a demo, makes them more set in stone then being part of a test.

The main point I have is ignoring the foxconn leak is an error when speculating on Switch's specs, I'm not saying they are the correct specs because I do believe he got it from a demo, as he wouldn't be in a position to know such information from his leaks, and him making the clocks up requires a good deal of information that is somewhat specialized and outside his possible knowledge (eurogamer's clocks)

When we are so close to the truth, it's best to keep our minds open to the possibility of solid leaks like Eurogamer's and Foxconn's, to understand the timing involved and to speculate on explain them.

I'm not trying to dismiss the Foxconn leak (and I don't think KingSnake is either) but I think it's certainly worthwhile to try and decipher the context around all of the leaker's info. He implied throughout the leaks that he was in touch with other people in the factory who knew more than he did, so it could be possible he saw a spec sheet in someone's office.

And maybe this hypothetical spec sheet is for operating clocks and not max clocks, as Foxconn might need to know what types of chips to toss out if they can't reach said clocks with the thermal ceiling given. (I don't know if this is how it's done- TSMC would naturally do most of the binning but maybe Foxconn has a QA system for the SoCs as well due to damage they may have sustained during transport from TSMC?)

Basically, I'm not doubting the leaker got this info accurate, but it's worth attempting to figure out what the info actually pertains to and what it means for retail hardware. The other interesting thing about reading through the translation after the OP here is that the description of changing motherboards leads me to believe there have been a few hardware revisions, so the one in the OP could very well not be a retail version.

I'm still curious if there is enough room on that board to replace the 10x15mm RAM modules with the other ones that Thraktor found that were square (no idea of the dimensions) such that the 10x15mm units were just placeholders. The bottleneck caused by the RAM bandwidth seems very un-Nintendo, although I guess there could be on-die solutions to compensate.
 

Thraktor

Member
I'm still curious if there is enough room on that board to replace the 10x15mm RAM modules with the other ones that Thraktor found that were square (no idea of the dimensions) such that the 10x15mm units were just placeholders. The bottleneck caused by the RAM bandwidth seems very un-Nintendo, although I guess there could be on-die solutions to compensate.

I don't think there's any reason to believe that the system's particularly bottlenecked by bandwidth, even at 25.6GB/s. Within the first few months we have at least two games running at 1080p/60fps using deferred shading (Fast RMX and MK8D), which shouldn't be the case for a heavily bandwidth-bottlenecked machine.

It seems like TBR (combined with Vulkan to allow it to be used for intermediate buffers) is doing its job well in controlling bandwidth requirements, although I would expect developers will have to be careful with non-tilable effects like DoF blurring.
 

z0m3le

Banned
People just hoping against hope that the Switch custom Tegra has:
I've been in both this thread and the Curious one, neither has anyone seriously speculating 512 cuda cores, 600+gflops fp32, Pascal or Denver 2.
-512 GPU cuda cores (na, it's going to be 256)
Absolutely, Switch is 256 cuda cores from what we can tell about the size of the die, and we have no rumors supporting anything more.
-600 Gflops to 1 TF (nope, it's going to be right around ~393 Gflops (fp32) when docked and 157 Gflops (fp32) undocked as a portable
-That FP16 capability somehow doubles the performance (it won't, could be useful in some situations of rendering but that will take time to utilize, give it a few generations of software)
No one is suggesting this, fp16 is a real thing though and some engines are starting to do this stuff automatically. AFAIK all pixel work can be done in fp16 for instance, and we have real developers saying they can use about 70% of their code in fp16 with no artifacts.

- Pascal GPU architecture (probably not, it's gonna be Maxwell, but there's not a whole lot of difference between Maxwell 2.0 and Pascal anyway)
- 128-bit bus for 50 GB/sec bandwidth (nope, it's going to be 64-bit and ~25 GB/sec)
There is virtually no difference between X1 and Pascal in terms of gaming and performance, especially if X1 has shrunk to 16nm. I do think Nintendo is more likely to use extra cache rather than a wider memory bus.

- Denver2 or A72 CPU cores (it's gonna be four A57s).
A72 is possible, although may not actually be needed to explain the foxconn clocks
- 16nm FinFET (unlikely, probably 20nm)
This is untrue, it would be lazy of Nintendo, but 16nm should be cheaper when everything is taken into account, and other larger chips from their competitors are already moved to 16nm including PS4 Pro which would be many times more expensive than Switch's SoC. 20nm is just not as viable and we only even suggest it because TX1 was 20nm.
- Maybe even simply, somewhat higher CPU and/or GPU clock speeds from what Eurogamer leaked.
I feel that a small clock speed bump from those July dev kits is the only fairly reasonable possibility.
Yeah I'm starting to think that the CPU clock from foxconn leak might have been just to push the hardware, though I do now think that the power consumption got worse in final hardware, because performance increased and there doesn't seem to be much changed in any other area from the july devkits, so clock speed increase would be the obvious one and we had reports from multiple insiders that the target battery life was 5 to 8 hours, and ended up being 2.5 to 6. For reference if they shrunk the chip to 16nm, the cpu shouldn't draw any more power to hit 1.4ghz than 1ghz on 20nm, and it would likely be higher than that, suggesting 1.6ghz as this should give the SoC 3Watts power draw on 16nm.
 

LordOfChaos

Member
Also people tend to be overly favourable in comparisons to GCN in XBO and PS4. I think someone assumed 40% performance advantage over a comparable GCN part, which is just unsubstantiated fudge factoring. Given how well AMD parts perform on Vulcan optimised games, the "Nvidia vs. AMD flops" argument may not have any bearing on console.


That's a good point. Nvidia paper flops performing ~30% better than AMD paper flops was on legacy APIs like DX11 and OpenGL, but the gap tends to disappear on low level APIs. Consoles would be running such low level APIs and with more customization for set hardware, so we actually don't know what difference to factor in.
 

z0m3le

Banned
I'm not trying to dismiss the Foxconn leak (and I don't think KingSnake is either) but I think it's certainly worthwhile to try and decipher the context around all of the leaker's info. He implied throughout the leaks that he was in touch with other people in the factory who knew more than he did, so it could be possible he saw a spec sheet in someone's office.

And maybe this hypothetical spec sheet is for operating clocks and not max clocks, as Foxconn might need to know what types of chips to toss out if they can't reach said clocks with the thermal ceiling given. (I don't know if this is how it's done- TSMC would naturally do most of the binning but maybe Foxconn has a QA system for the SoCs as well due to damage they may have sustained during transport from TSMC?)

Basically, I'm not doubting the leaker got this info accurate, but it's worth attempting to figure out what the info actually pertains to and what it means for retail hardware. The other interesting thing about reading through the translation after the OP here is that the description of changing motherboards leads me to believe there have been a few hardware revisions, so the one in the OP could very well not be a retail version.

I'm still curious if there is enough room on that board to replace the 10x15mm RAM modules with the other ones that Thraktor found that were square (no idea of the dimensions) such that the 10x15mm units were just placeholders. The bottleneck caused by the RAM bandwidth seems very un-Nintendo, although I guess there could be on-die solutions to compensate.

This is my point with the spec sheet talk, if it was seen, then the demos would be running those clocks anyways, as those are operational clocks. One is reasonably speculative and the other is excessively speculative that only nitpicks the details.

The speculation about the board changes is something I've been trying to talk about for a few pages and is why I suggested the OP was a prototype, the timing never seemed to make sense to me as if you are ready for prototyping final hardware, final devkits should come out around the same time because you have software deadlines to meet and was the entire reason there was a delay to begin with right?

The memory could go either way (higher bandwidth or larger cache, I tend to think a higher cache sounds more Nintendo. We've heard from multiple sources that the Nintendo Switch is the easiest platform to develop on, which a large cache would likely solve many memory issues, and could be handled fairly easily with only 4MB or so. Though it is worth noting that the Wii U's 45nm 157mm^2 GPU held 35MB of embedded memory, so 4MB should be fairly trivial.

Again, I think talking about the foxconn clocks, would suggest that the demos were ran at these clocks regardless of where he got his info from, since that is the assumption you do have to make. It is possible that they max clock tested the chip instead, but those clocks should be higher than the foxconn leak and would line up more with this blue paper floating around.

That's a good point. Nvidia paper flops performing ~30% better than AMD paper flops was on legacy APIs like DX11 and OpenGL, but the gap tends to disappear on low level APIs. Consoles would be running such low level APIs and with more customization for set hardware, so we actually don't know what difference to factor in.

We do see the Nvidia GTX 750TI (1.39tflops) keep up with PS4 (1.843tflops) performance in many games, and that is exactly the type of gap we've been led to believe exists between AMD and Nvidia. The difference is when a game is using heavy ASync Computing, like in some vulkan titles and DX12 generally uses, these allow the AMD card to edge out the Nvidia one, but flop for flop, there is still an advantage here for Nvidia. We still also need to see better driver support from Nvidia with these new APIs, Switch would have very good API utilization as the API is built around the chip itself (NVN)
 

LordOfChaos

Member
FWIW, the 4MB SRAM block in the A8 was ~4.9 mm2, so with a 80% linear shrink (64% area shrink) it would be ~3.1 mm2.

Actually with four removed A53 cores (0.7mm2) that could match close enough to look about the same size for the total die.
 

Rodin

Member
I don't think there's any reason to believe that the system's particularly bottlenecked by bandwidth, even at 25.6GB/s. Within the first few months we have at least two games running at 1080p/60fps using deferred shading (Fast RMX and MK8D), which shouldn't be the case for a heavily bandwidth-bottlenecked machine.

It seems like TBR (combined with Vulkan to allow it to be used for intermediate buffers) is doing its job well in controlling bandwidth requirements, although I would expect developers will have to be careful with non-tilable effects like DoF blurring.

That seems to be causing a few issues in Breath of the Wild and i wonder if that's the reason why they kept the 900p resolution instead of going with 1080p, but then again MK8 has DoF as well in replay mode and it's a 1080p game on Switch.

I wonder if they managed to mitigate the issue for the final version of the game, and if with more time in the oven (or if the game was made with the Switch in mind from the scratch) they would've managed to found some workarounds for that and boost it to 1080p.

FWIW, the 4MB SRAM block in the A8 was ~4.9 mm2, so with a 80% linear shrink (64% area shrink) it would be ~3.1 mm2.

Actually with three removed A53 cores (0.7mm2) that could match close enough to look about the same size for the total die.

Makes sense. I wonder what kind of bandwidth can we expect from the eventual SRAM.

This is absolutely not true. We have done FP16 before on the RSX, and other platforms, and that tale is full of woe discovering where the precision breaks down in lighting and you need to do local promotion to avoid artifacts. And on architectures that require additional instructions to pack/unpack into registers, whether the ALU savings amortize out over the shuffling overhead.

Well, another developer (sebbbi) claimed that he managed to write 70% of some game's code in FP16, he's not making this up.

Now no one is expecting that every Switch game will use FP16 for 70% of its code, but acting like it's completely irrelevant like some people are doing is ridiculous.
 
No one is suggesting this, fp16 is a real thing though and some engines are starting to do this stuff automatically. AFAIK all pixel work can be done in fp16 for instance, and we have real developers saying they can use about 70% of their code in fp16 with no artifacts.

This is absolutely not true. We have done FP16 before on the RSX, and other platforms, and that tale is full of woe discovering where the precision breaks down in lighting and you need to do local promotion to avoid artifacts. And on architectures that require additional instructions to pack/unpack into registers, whether the ALU savings amortize out over the shuffling overhead.
 
I don't think there's any reason to believe that the system's particularly bottlenecked by bandwidth, even at 25.6GB/s. Within the first few months we have at least two games running at 1080p/60fps using deferred shading (Fast RMX and MK8D), which shouldn't be the case for a heavily bandwidth-bottlenecked machine.

It seems like TBR (combined with Vulkan to allow it to be used for intermediate buffers) is doing its job well in controlling bandwidth requirements, although I would expect developers will have to be careful with non-tilable effects like DoF blurring.

Didn't we see a post not long ago from a Ubi developer saying that 25.6GB/s would indeed be a huge bottleneck even accounting for TBR? Specifically because of these types of post processing effects like depth of field you mention? It could be that MK8 and RMX are able to run at 1080p/60fps because there is a large on-die cache or something.

Regarding the DoF issue in Zelda mentioned by Digital Foundry, I believe that was specifically for the demo version and didn't even occur consistently. So it might be something we don't see in the retail version which would indicate the effective bandwidth is higher than 25.6GB/s right?

That's a good point. Nvidia paper flops performing ~30% better than AMD paper flops was on legacy APIs like DX11 and OpenGL, but the gap tends to disappear on low level APIs. Consoles would be running such low level APIs and with more customization for set hardware, so we actually don't know what difference to factor in.

There is still the difference in efficiency between the more modern Maxwell architecture and the older R7000 (I think) architecture in PS4/XB1, which should mean the Maxwell flops do perform better than R7000 flops. Maybe not by 40% or 30% but there should be some gains purely from more modern architecture.

The other thing to mention is, if the NVN API is as good as everyone is saying it is, couldn't that be a similar advantage to the one Nvidia hardware has in a PC environment? Again, maybe not to the same extent, but still potentially an effective flop advantage. I think a game like Snake Pass (which struggles to reach 60fps on PS4) running at 1080p/30fps locked on Switch shows that the Switch will surely be punching above its weight when it comes to pure raw numbers.
 

z0m3le

Banned
This is absolutely not true. We have done FP16 before on the RSX, and other platforms, and that tale is full of woe discovering where the precision breaks down in lighting and you need to do local promotion to avoid artifacts. And on architectures that require additional instructions to pack/unpack into registers, whether the ALU savings amortize out over the shuffling overhead.

The RSX was not a modern GPU, didn't it use fix functions and was a big issue with the PS3? I'm sure the PS3 was a huge problem, but I'm not sure you can compare it to a modern GPU doing fp16 code as these were designed around it. The reference of 70% of an ex ubisoft developer's code using fp16 is his own statement. UE4 also does support FP16 and AMD is pushing FP16 heavy with VEGA, so expect to use it more in AAA products if that is where you work, because the entire industry is about to start heavy pushing.

Also mobile has been using FP16 for years now, so it's nothing new, and weren't old fix function shaders fp16?
 

LordOfChaos

Member
There is still the difference in efficiency between the more modern Maxwell architecture and the older R7000 (I think) architecture in PS4/XB1, which should mean the Maxwell flops do perform better than R7000 flops. Maybe not by 40% or 30% but there should be some gains purely from more modern architecture.

The other thing to mention is, if the NVN API is as good as everyone is saying it is, couldn't that be a similar advantage to the one Nvidia hardware has in a PC environment? Again, maybe not to the same extent, but still potentially an effective flop advantage. I think a game like Snake Pass (which struggles to reach 60fps on PS4) running at 1080p/30fps locked on Switch shows that the Switch will surely be punching above its weight when it comes to pure raw numbers.

There's still a difference, but probably not the 40% a number of people have been factoring in.

I would think among the low level APIs, there's more of a point of diminishing returns than in legacy APIs. I.e, AMD has 70% shader utilization on DX11, Nvidia has 85%, just for example. If on a low level API one had 95% and one had 99%, it would of course be less notable than on older higher overhead APIs.

I.e, DX12 and Vulkan, when a game uses them well, don't appear to perform much different.
 

z0m3le

Banned
There's still a difference, but probably not the 40% a number of people have been factoring in.

I would think among the low level APIs, there's more of a point of diminishing returns than in legacy APIs. I.e, AMD has 70% shader utilization on DX11, Nvidia has 85%, just for example. If on a low level API one had 95% and one had 99%, it would of course be less notable than on older higher overhead APIs.

I.e, DX12 and Vulkan, when a game uses them well, don't appear to perform much different.

I do agree with the idea of removing the performance advantage as a hard number. I've been doing this for a while now. The advantage that Switch is going to have over other base consoles this generation is in fp16, where it can be used to achieve a higher performance than it otherwise would.

I throw around the 70% number in this, that being 70% of the available shaders can run in fp16 instead, but it should be made crystal clear that this is a very high number that not all games are going to achieve, and many people might want to think of it as more like 40-50% of the available flops. After calculating this, the awareness that Nvidia flops out perform AMD flops is something to take note of, but I agree it likely won't be 4/3 split, and is probably closer to something like 10% to 20%, though as the industry makes fp16 standard again, we will see later games in switch's library get closer and closer to that 70% number.

It helps to keep in mind that Flops are a theoretical number, and software isn't going to 100% hit that, certainly not in every frame of every game, so when I say something like Switch could be capable of achieving 800gflops +nvidia advantage for instance, that is a combination of all the above, including Foxconn clocks, the advantage Nvidia should see over AMD's 2011 architecture found in XB1 and PS4, as X1 is 4 years newer architecture (it's a noticeable change from maxwell, towards pascal, not that there is a performance gap noticeable here) PS that number would be ~680gflops + nvidia advantage with Eurogamer's clocks.
 
Also people tend to be overly favourable in comparisons to GCN in XBO and PS4. I think someone assumed 40% performance advantage over a comparable GCN part, which is just unsubstantiated fudge factoring. Given how well AMD parts perform on Vulcan optimised games, the "Nvidia vs. AMD flops" argument may not have any bearing on console.

The other issue is that docked mode may not provide a substantial boost in games that are already using the max bandwidth mode.

To me the most optimistic scenario will be Shield Android TV level performance with the thin OS and API offsetting some of the performance loss from the decreased clock speed.

RX480 performs a little bit better (like ~5% on average I think) in DX12 titles compared to Nvidias 1060. That's with ~5,8 vs. ~4,4 Tflops.
 
Top Bottom