• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Digital Foundry: Nintendo Switch CPU and GPU clock speeds revealed

Status
Not open for further replies.

Peltz

Member
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.

I'd be kinda shocked if a fan needs to run in portable mode. That sounds rather noisy for a handheld gaming device. Plus, more moving parts would make the Switch more fragile than handheld systems in the past.

I mean, do any phones or tablets out there need a fan? (I'm genuinely asking).

Occam's razor says that among competing hypotheses, the one with the fewest assumptions should be selected. I'm just saying...
 

Xellos

Member
I had my expectations at 200-250 GFLOPS portable and 400-500 GFLOPS docked, so this is a bit worse. Portable mode GPU should still be better than a Wii U, but not by a whole lot. At least the CPU and RAM upgrades are present whether portable or docked.

307 Mhz portable and sub-X1 performance docked makes me think that Nintendo went with 20nm. Looking forward to the teardown and analysis after launch.
 

Violet_0

Banned
my only problem would be if all the multi-platform games ignore the Nintendo console - again. Because I'm so very very sick of that
 

Fredrik

Member
This is hugely disappointing really, Nintendo is pretty much dropping out of the home console race from my point of view, they're going after the mobile gaming crown instead, which I think is a dumb move since people already carry mobile gaming machines in their pocket and probably don't want to carry around another annoyingly big thing. This will compete with mobile phones and 3DS and Vita and will bomb unless it's $199 or lower, being able to dock a portable console isn't going to help much if the pricing is off from a portable console perspective.
It'll be very interesting to see the third party support and how games like Skyrim runs on this thing.
 

LordKano

Member
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.

Thanks for the read ! So yeah, again, we need more infos.
 

asagami_

Banned
Very much doubt this bit. Those Japanese Vita devs have been primarily targeting PS4 for the last year or two and Switch is probably coming in too low now.

They are targeting PS4 because sells better than Vita, no because their games are pushing graphically.
 
Power isn't nearly as much of a reason as to what will hinge on whether or not third parties support this thing as "will people who own it actually buy third party games on the thing", is

There are still a lot of unknowns about the specs that we should hold off on before Nintendooming about.. But still.

If games like Skyrim or Dark Souls Trilogy or whatever third party support makes it on switch sells well.. We'll get more stuff like that. If not, the third party support will dry up like it did almost immediately on the WiiU.

Nintendo could have made a system with Scorpio power but it wouldn't mean shit if people who owned it didn't buy the third party games that release on the thing. I'm hoping the portability aspect of the system actually draws interest from more people than Nintendo's previous "hooks", like waggle control on the Wii and Gamepad dual screen whatever on the WiiU,
 

Instro

Member
I just think it's continually astounding that Nintendo can come in under the most conservative estimates of what can be done on modern, cheap, off the shelf hardware.

Didn't they downclock the WiiU for better power consumption as well?

I believe there was a temperature issue as well.
 
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.
Thanks for your detailed analysis, Thraktor. I was waiting for your post.
 
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.

Thanks for the read, really insightful. At this point I'm just going to avoid participating in these threads, and wait for the reveal. If what I see isn't visually appealing to me, then my plans will Switch.
 

Hilarion

Member
Yes, of course it will have a good library. But those games would be better with better specs.

Would they? Visuals have zip to do with game quality. There are 30 year old games that are superior to games released this year because resolution, aliasing, audio quality, etc have nothing to do with whether a game is enjoyable or not. Look at Pac Man: it's a better video game than most of the games of 2016 and it is as primitive as it gets.
 

wildfire

Banned
Half an Xbox One then ? Isn't that what we were all expecting ? There's probably something I'm missing.

No.

People interested in the Switch expected the gap between Xbox 1 and the switch expected the XB1 to at least be 50% better not double. That 50% difference isn't even Shield TV at its best.

For the docked mode to be so much lower is a disappointment. It's understandable that they wanted parity between docked and downclocked handheld but the outcome reveals the screen should've been 540p in the first place.
 

hatchx

Banned
I don't know a lot about tech, but this seems a lot worse than expected.

I guess I don't understand why companies like Take 2, Bethesda, and FROM software have said good things about the Switch? If everyone on here is having meltdowns, why is developer reception so much stronger than it was with the WiiU?

I suppose the good news here is that lower power draw means better battery and cheaper price. I can't help but feel a bit disappointed that base PS4/X1 games will need downgrading for Switch.
 

Metal B

Member
you can apply that logic to every game, on every console
Of cource, imagine what a hit this game would have been today, if it had better graphics:
Minecraft_WiiU_MashupPack_Mario_Shot6.png
 

Cday

Banned
Why would it be cheaper when they're just under-utilizing the chip? It still has the chip in it. They're still paying for the chip. The only thing I can think of is the chips can be higher yield since they won't need to function as highly as a regular one. In other words they won't need to be as free of manufacturing defects.
 

asagami_

Banned
I don't know a lot about tech, but this seems a lot worse than expected.

I guess I don't understand why companies like Take 2, Bethesda, and FROM software have said good things about the Switch? If everyone on here is having meltdowns, why is developer reception so much stronger than it was with the WiiU?
.

Becuase people see numbers and don't understand what the hell means really.
 

lt519

Member
Assuming a downclocked TX1, on handheld mode:

1) ~Wii U.
2) iPhone 7 > Switch > iPhone 6S.
3) Switch >= 8xVita.

I'm so OK with this, dunno why people are freaking out. That is a beast of a handheld if it they are getting 5-8 hours of battery playing full Wii U+ games. An iPhone 7 can barely muster 4 hours of Clash of Clans.

TechRadar said:
As was the case earlier in its life, playing 3D games really hit this 'mature' iPhone 6's battery where it hurts. Playing 10 minutes of Freeblade – the graphically-rich shooter Apple used to show off the iPhone 6S at launch – sapped around 15% of the battery's charge.
 

Anth0ny

Member
I don't know a lot about tech, but this seems a lot worse than expected.

I guess I don't understand why companies like Take 2, Bethesda, and FROM software have said good things about the Switch? If everyone on here is having meltdowns, why is developer reception so much stronger than it was with the WiiU?

I suppose the good news here is that lower power draw means better battery and cheaper price. I can't help but feel a bit disappointed that base PS4/X1 games will need downgrading for Switch.

third parties always have good things to say about new hardware before it actually comes out

i'd be shocked if any of them are seriously supporting the thing within a year of launch
 

Oregano

Member
They are targeting PS4 because sells better than Vita, no because their games are pushing graphically.

Actually the Vita SKUs still sell better even when they look and run like crap.

I'm so OK with this, dunno why people are freaking out. That is a beast of a handheld if it they are getting 5-8 hours of battery playing full Wii U+ games. An iPhone 7 can barely muster 4 hours of Clash of Clans.

You're not getting 5-8 hours of battery though, the Switch has a fan eating up battery life.
 
Well...because many of us don't give a single fuck about mobile gaming. The real question is why do so many seem to believe they can deflect console expectations, criticisms and disappointments with mobile stats, as if mobile and consoles are the same thing, bought by the same audiences for the same purpose.

I'm sure this won't surprise you, but many of us are grown ass adults that drive to work and don't have enough interest (or opportunity) to game while mobile anymore. We work at work and game at home where we have full access to a TV/monitor which can give us a superior experience to the often hand-cramping, poor resolution experience that typifies mobile gaming (at least on Nintendo and Sony handhelds). Mobile was good shit K-12 and through part of college, but that was the end of that for me. Maybe if I lived in a subway city, I'd still have use for mobile gaming, but I don't.

So yea, for those of us who only play at home on a proper console, Nintendo hasn't offered a platform with good 3rd party support since the SNES. It is what it is. No need to attempt to spin that reality; I've been playing on Nintendo platforms longer than half this forum has been alive. I know what time it is.

This is surprising since I didn't know you spoke for all grown ass adults. If this is your personal opinion fine, but you do not speak for anyone but yourself. You mentioning how you and others are "grown ass adults" does not make your opinion/preference any more valid than others.
 

Yasumi

Banned
They are targeting PS4 because sells better than Vita, no because their games are pushing graphically.

They're targeting it due to vaguely similar demographics, that don't really exist on Nintendo ecosystems. Unless there's a noticeably huge shift from Vita->Switch, those developers will likely continue hovering around Sony consoles.
 

fireflame

Member
300 dollars would really be expensive for a hybrid console that is a bit underpowered. The 3DS struggled at 250, and even hough the Switch has a dock is it enough to convince people to buy it at such a high price? In any case, i feel the safest move for players is to wait to see if the console is selling and is getting games, and then take a decision. Wait and see if they bring the games you want, if third party keep supporting the console,etc.
 

phanphare

Banned
Quoting this again because, as I've said in this thread and other threads, the narrative that's being spun all of a sudden that this is a 3DS successor and portable console is revisionist to the max.

This is not a dedicated handheld.

like many people have said before it's both a Wii U and 3DS successor

it's also a dedicated gaming device that can be used primarily as a handheld so it kind of is
 

ASIS

Member
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.
Thanks for the write (I'll pretend I understood what you wrote, but it does sound more positive than what others have been saying).

Two questions:

1. How much higher than Wii U are we talking here regrading each scenario? Most people are fine with the portable aspect, but it's the home console aspect that is making people weary. How does it compare to XB1 and what should we expect from ports to Switch? Is it merely less graphical fidelity or will it be something more severe?

2. It seems that active cooling is what's not making any sense for the Switch, but then I wonder, what is the possibility of it simply being badly designed? It might seem like crazy talk given that it's Nintendo we are talking about there. But this is really the simplest reason for it.
 

AmyS

Member
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.

Thank you for writing this. Very informative.

So GAF, what is the most likely: 2 SMs or 3 SMs ?
 

hatchx

Banned
As a handheld it's fine. As a console it's a trash fire.


Why are they aiming for such low specs when there's better alternatives available? Is it for price? battery? weight?

A lot of the armchair analysts here seem to think they could have packed a lot more power in. Why wouldn't they?
 
Does Nintendo have any other options when you have to balance these 3?

1. Battery life
2. Heat
3. Price

I kinda wish Nintendo went for a seperate handheld and console that share the same library. That was they could have had best of both worlds.
 

Hilarion

Member
Nice to see the "only gameplay matters" crowd is still around to downplay issues on low power Nintendo hardware.

I have yet to see a game whose quality has been affected by resolution or even visuals. One of my favorite games is The Hitchhiker's Guide to the Galaxy game from 1984 and that game is text only.

If a game is fun, it is good. If it isn't, it isn't.
 

Metal B

Member
Assuming a downclocked TX1, on handheld mode:
2) iPhone 7 > Switch > iPhone 6S.
This comparison is bad, since the Switch is designed to run consistence longer. Mobile devices after long periods (say one hour) get hotter and lose in power to not overhead.
 

asagami_

Banned
They're targeting it due to vaguely similar demographics, that don't really exist on Nintendo ecosystems. Unless there's a noticeably huge shift from Vita->Switch, those developers will likely continue hovering around Sony consoles.

And it's fine, actually. I'm sure many PS3/PS4 games from japanese devs want to launchs games on a handheld device. Some are launched in Vita, but with lower framerates (or definitevely don't reach outside Japan). Switch is a nice solution to their problem.
 

Panajev2001a

GAF's Pleasant Genius
Fuck that i was expecting to match the X1 at least when in docked mode... i didn't even get that!

Considering that nVidia will be launching the successsor to the X2 in the next Shield TV revision soon it looks like they treated Nintendo with the same special tree lantern they reserved for Sony at least ;).
 

RibMan

Member
We've suspected the final specs would be significantly outdated again, but if DF sources are right, it looks like the Switch will be in the same power range as the Wii U. Disappointing if true, but hopefully these specs mean the Switch won't be expensive. If it's priced at $159.99 or less then I don't think the low specs will matter that much -- especially if a lot of parents want to get one for their kid(s). At this point, I think I'm more interested in knowing if the final screen is really just 720p, and whether or not it has any multi-touch capabilities.

My favourite posts on GAF are those from simple people who likely know nothing tech-wise and especially on the marketing side and yet claims that Nintendo never learns, that they are infintely more stupid than themselves. Pretending that something so obvious a children could understand is out of the whole R&D division of Nintendo.

Pretentious as fuck.

200-29.gif


I don't think you need a doctorate in Nintendo decisions to know that Nintendo has done this before and the outcome was extremely undesirable. The power capabilities of the Switch could result in another round of headaches and disinterest from developers, publishers, and consumers. I don't know about you, but I think Nintendo should avoid creating a sequel to the Wii U.

We're still dealing in hypotheticals and rumors, so everything we know could change at the January event. For a portable, the Switch seems like it will beat the Vita in performance but fall short of modern smartphones. For a console, it doesn't seem like it will be competitive against any of the current or future hardware releases. This could be a good thing, but looking at what happened with the Wii U, you can begin to understand why a lot of people believe this could be a bad thing. A very bad thing.
 

killroy87

Member
Why are they aiming for such low specs when there's better alternatives available? Is it for price? battery? weight?

A lot of the armchair analysts here seem to think they could have packed a lot more power in. Why wouldn't they?

Price, to answer your question. Things like battery are also a factor, but they could just put in a better battery, which in turn would effect price.

The clearly want this thing to be cheap. Which is fine. But people really, truly need to set their expectations accordingly. There's a reason an iPad costs what it does.
 

Hydrus

Member
So once again Nintendo hardware will be under powered, full of old ports, and overpriced. SMH Nintendo. I was already cautious about buying one after getting burnt on the Wii U. Now my interest just completely dropped. This thing is gonna have to have a killer launch line up of NEW games, not Wii U/ 3DS ports, and a sub $200 price tag if I'm gonna even entertain the idea of buying one.
 

lt519

Member
You're not getting 5-8 hours of battery though, the Switch has a fan eating up battery life.

I guess it was Drake that gave the 5-8 hour number, so who knows now. But the slower speeds seem to imply they are gunning for battery life over performance.

So once again Nintendo will be under powered, full of old ports, and overpriced. SMH Nintendo. I was already cautious about buying one after getting burnt on the Wii U. Now my interest just completely dropped. This thing is gonna have to have a killer launch line up of NEW games, not Wii U/ 3DS ports, and a sub $200 price tag if I'm gonna even entertain the idea of buying one.

lol @ all the "Give me PS4+ performance in a handheld for under $200 or I'm not buying!"
 

wildfire

Banned
GPU Clock

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Well hopefully it is 3 or 4 SM. It's the only way to turn around this bad news into acceptable.
 
I don't know a lot about tech, but this seems a lot worse than expected.

I guess I don't understand why companies like Take 2, Bethesda, and FROM software have said good things about the Switch? If everyone on here is having meltdowns, why is developer reception so much stronger than it was with the WiiU?

I suppose the good news here is that lower power draw means better battery and cheaper price. I can't help but feel a bit disappointed that base PS4/X1 games will need downgrading for Switch.

They are probably saying good things because they will finally have a handheld platform with decent specs to release their games onto, they didn't have that previously.
 

fireflame

Member
I was linked some videos about Wii U that showed 3rd party were all excited before its launch, and on gaming websites, many people were enthusiastic in western companies like Take Two,EA,etc. But then all rats left the ship.
 

guek

Banned
I haven't had time to read through every response here, so I'm probably repeating what others have already said, but here are my thoughts on the matter, anyway:

CPU Clock

This isn't really surprising, given (as predicted) CPU clocks stay the same between portable and docked mode to make sure games don't suddenly become CPU limited when running in portable mode.

The overall performance really depends on the core configuration. An octo-core A72 setup at 1GHz would be pretty damn close to PS4's 1.6GHZ 8-core Jaguar CPU. I don't necessarily expect that, but a 4x A72 + 4x A53 @ 1GHz should certainly be able to provide "good enough" performance for ports, and wouldn't be at all unreasonable to expect.

Memory Clock

This is also pretty much as expected as 1.6GHz is pretty much the standard LPDDR4 clock speed (which I guess confirms LPDDR4, not that there was a huge amount of doubt). Clocking down in portable mode is sensible, as lower resolution means smaller framebuffers means less bandwidth needed, so they can squeeze out a bit of extra battery life by cutting it down.

Again, though, the clock speed is only one factor. There are two other things that can come into play here. The second factor, obviously enough, is the bus width of the memory. Basically, you're either looking at a 64 bit bus, for 25.6GB/s, or a 128 bit bus, for 51.2GB/s of bandwidth. The third is any embedded memory pools or cache that are on-die with the CPU and GPU. Nintendo hasn't shied away from large embedded memory pools or cache before (just look at the Wii U's CPU, its GPU, the 3DS SoC, the n3DS SoC, etc., etc.), so it would be quite out of character for them to avoid such customisations this time around. Nvidia's GPU architectures from Maxwell onwards use tile-based rendering, which allows them to use on-die caches to reduce main memory bandwidth consumption, which ties in quite well with Nintendo's habits in this regard. Something like a 4MB L3 victim cache (similar to what Apple uses on their A-series SoCs) could potentially reduce bandwidth requirements by quite a lot, although it's extremely difficult to quantify the precise benefit.

GPU Clock

This is where things get a lot more interesting. To start off, the relationship between the two clock speeds is pretty much as expected. With a target of 1080p in docked mode and 720p in undocked mode, there's a 2.25x difference in pixels to be rendered, so a 2.5x difference in clock speeds would give developers a roughly equivalent amount of GPU performance per pixel in both modes.

Once more, though, and perhaps most importantly in this case, any interpretation of the clock speeds themselves is entirely dependent on the configuration of the GPU, namely the number of SMs (also ROPs, front-end blocks, etc, but we'll assume that they're kept in sensible ratios).

Case 1: 2 SMs - Docked: 384 GF FP32 / 768 GF FP16 - Portable: 153.6 GF FP32 / 307.2 GF FP16

I had generally been assuming that 2 SMs was the most likely configuration (as, I believe, had most people), simply on the basis of allowing for the smallest possible SoC which could meet Nintendo's performance goals. I'm not quite so sure now, for a number of reasons.

Firstly, if Nintendo were to use these clocks with a 2 SM configuration (assuming 20nm), then why bother with active cooling? The Pixel C runs a passively cooled TX1, and although people will be quick to point out that Pixel C throttles its GPU clocks while running for a prolonged time due to heat output, there are a few things to be aware of with Pixel C. Firstly, there's a quad-core A57 CPU cluster at 1.9GHz running alongside it, which on 20nm will consume a whopping 7.39W when fully clocked. Switch's CPU might be expected to only consume around 1.5W, by comparison. Secondly, although I haven't been able to find any decent analysis of Pixel C's GPU throttling, the mentions of it I have found indicate that, although it does throttle, the drop in performance is relatively small, and as it's clocked about 100MHz above Switch to begin with it may only be throttling down to a 750MHz clock or so even under prolonged workloads. There is of course the fact that Pixel C has an aluminium body to allow for easier thermal dissipation, but it likely would have been cheaper (and mechanically much simpler) for Nintendo to adopt the same approach, rather than active cooling.

Alternatively, we can think of it a different way. If Switch has active cooling, then why clock so low? Again assuming 20nm, we know that a full 1GHz clock shouldn't be a problem for active cooling, even with a very small quiet fan, given the Shield TV (which, again, uses a much more power-hungry CPU than Switch). Furthermore, if they wanted a 2.5x ratio between the two clock speeds, that would give a 400MHz clock in portable mode. We know that the TX1, with 2 SMs on 20nm, consumes 1.51W (GPU only) when clocked at about 500MHz. Even assuming that that's a favourable demo for the TX1, at 20% lower clock speed I would be surprised if a 400MHz 2 SM GPU would consume any more than 1.5W. That's obviously well within the bounds for passive cooling, but even being very conservative with battery consumption it shouldn't be an issue. The savings from going from 400MHz to 300MHz would perhaps only increase battery life by about 5-10% tops, which makes it puzzling why they'd turn down the extra performance.

Finally, the recently published Switch patent application actually explicitly talks about running the fan at a lower RPM while in portable mode, and doesn't even mention the possibility of turning it off while running in portable mode. A 2 SM 20nm Maxwell GPU at ~300MHz shouldn't require a fan at all, and although it's possible that they've changed their mind since filing the patent in June, it begs the question of why they would even consider running the fan in portable mode if their target performance was anywhere near this.

Case 2: 3 SMs - Docked: 576 GF FP32 / 1,152 GF FP16 - Portable: 230.4 GF FP32 / 460.8 GF FP16

This is a bit closer to the performance level we've been led to expect, and it does make a little bit of sense from the perspective of giving a little bit over TX1 performance at lower power consumption. (It also matches reports of overclocked TX1s in early dev kits, as you'd need to clock a bit over the standard 1GHz to reach docked performance here.) Active cooling while docked makes sense for a 3 SM GPU at 768MHz, although wouldn't be needed in portable mode. It still leaves the question of why not use 1GHz/400MHz clocks, as even with 3 SMs they should be able to get by with passive cooling at 400MHz, and battery consumption shouldn't be that much of an issue.

Case 3: 4 SMs - Docked: 768 GF FP32 / 1,536 GF FP16 - Portable: 307.2 GF FP32 / 614.4 GF FP16

This would be on the upper limit of what's been expected, performance wise, and the clock speeds start to make more sense at this point, as portable power consumption for the GPU would be around the 2W mark, so further clock increases may start to effect battery life a bit too much (not that 400-500MHz would be impossible from that point of view, though). Active cooling would be necessary in docked mode, but still shouldn't be needed in portable mode (except perhaps if they go with a beefier CPU config than expected).

Case 4: More than 4 SMs

I'd consider this pretty unlikely, but just from the point of view of "what would you have to do to actually need active cooling in portable mode at these clocks", something like 6 SMs would probably do it (1.15 TF FP32/2.3 TF FP16 docked, 460 GF FP32/920 GF FP16 portable), but I wouldn't count on that. For one, it's well beyond the performance levels that reliable-so-far journalists have told us to expect, but it would also require a much larger die than would be typical for a portable device like this (still much smaller than PS4/XBO SoCs, but that's a very different situation).

TL:DR

Each of these numbers are only a single variable in the equation, and we need to know things like CPU configuration, memory bus width, embedded memory pools, number of GPU SMs, etc. to actually fill out the rest of those equations to get the relevant info. Even on the worst end of the spectrum, we're still getting by far the most ambitious portable that Nintendo's ever released, which also doubles as a home console that's noticeably higher performing than Wii U, which is fine by me.

Sounds good to me
 
Status
Not open for further replies.
Top Bottom