Some questions.
128bit bus + 32mb esram vs 256bit bus no esram. Which is cheaper in the long run.
LPDDR4 vs DDR4 which should NX use.
As PS4 and XBO have shown us, in general a single high bandwidth pool (e.g. GDDR5) will likely be the more cost-effective option than SRAM plus low-bandwidth DDR. It's also worth noting that if you wanted to build a console that properly targeted 1080p you'd really want 64MB of SRAM rather than the 32MB that XBO uses. If you wanted a PS4-competitve device with a 64MB pool of SRAM on your SoC, the die size would balloon up to over 500mm² (i.e. somewhere between the size of a R9 390X and a Fury X). There's no way that's cheaper than just using 8GB of GDDR5.
Regarding DDR4 and LPDDR4, despite their similar names they have quite different applications in a games console.
DDR4 could, at the fastest available speeds (3200 MT/s) achieve around 100GB/s on a 256 bit bus. The problem is that DDR4 chips are generally only available in 16 bit I/O form, which means you'd need 16 DDR4 chips on your motherboard to achieve that. As willing as I am to consider that Nintendo might use a wider memory interface, I'd be very surprised if they dropped 16 RAM chips on their motherboard (and they would be very unlikely to be able to reduce that number in future revisions). DDR4 is, however, a lot cheaper than GDDR5 or LPDDR4, so in the unlikely event that Nintendo decides they need 12GB or 16GB of RAM in NX it would be a lot cheaper to do so with DDR4 than LPDDR4 or GDDR5.
LPDDR4 has only slightly faster available speeds than DDR4 (3733 MT/s), but has far higher per-chip bandwidth, due to a much wider interface (typically 64 bit I/O per chip), and even competes with GDDR5 on a per-chip bandwidth measure, while being available in much larger capacities. It seems to me more likely as a substitute for GDDR5 (i.e. single high bandwidth pool) than DDR4 (secondary low bandwidth pool), and is possibly more expensive than GDDR5 for a similar capacity and bandwidth. It has a couple of advantages over GDDR5, the first being that it can achieve sufficient bandwidth and capacity in fewer chips, and the second being the much lower power consumption. This isn't just a reduction of peak power draw, but a massive reduction in standby power draw. If Nintendo wants to implement instant wake from standby and suspend-resume on NX they'll need to keep RAM wired while in standby mode. With something like GDDR5 this consumes a lot of power (which is not a small contributor to PS4's standby power draw), whereas LPDDR4 would allow Nintendo to achieve the same functionality with a standby power draw potentially not much higher than Wii U's.
Thanks for responding, Thraktor.
I wasn't aware of the dilemma Nintendo is facing with eDRAM going forward. It'll be interesting to see what solution they settle on. I don't really want what - architecture wise - amounts to PS4/XB1 in a Nintendo labeled casing. It's a bit more fun seeing unique designs with their various strength/weaknesses. If the hardware is up to participate this time around, I hope the software side of development is by far more welcoming to developers than it ever was. Nintendo can never seem to achieve both bullet points simultaneously, haha.
It's worth keeping in mind that Nintendo have only ever actually designed two 3D home consoles from scratch: the N64 and Gamecube. Wii was obviously an upgraded Gamecube, and Wii U's requirement for Wii BC required following many of the same architectural choices (e.g. PPC CPU and split memory pools). Of those two, N64 had a single unified memory pool and Gamecube had split pools. If Nintendo is dropping Wii U BC and building NX from scratch it's difficult to say that they have a pattern of choosing split memory pools. They did it once 15 years ago and have been following on from that decision since.
Of course, 3DS uses separate VRAM pools as well, although in that case the FCRAM used as the larger memory pool was, as far as I'm aware, the highest bandwidth solution available to them, so if they felt they needed more bandwidth for the framebuffer then on-die SRAM was pretty much the only option available to them.
Ok new pessimistic NX configurations, ascending order of credibility.
1:
2 x 4 core Puma at 1.8+ ghz
4GB of GDDR5 96 gb/s, 128 bit bus
640:40:16 GCN 1.1 GPU at 800mhz 1024gigaflops
2:
2 x 4 core Puma at 1.8+ ghz
8GB of DDR4 / LPDDR4, 128 bit bus 50/60 gb/s
640:40:16 GCN 1.2 GPU at 600mhz 800 gigaflops
3.
2 x 4 core Puma at 1.8+ ghz
8GB of DDR4 / LPDDR4 256 bit bus 100/120 gb/s
640:40:16 GCN 1.1 GPU at 800mhz 1024 gigaflops.
Optimistic version: Assuming higher TDP, 14nm fab, and GDDR5x
2 x 4 core Puma at 2.0+ ghz
8GB of GDDR5x, 128 bit bus, 160 - 200 gb/s
1024:72:32 Polaris at 1.0 ghz. 2.0+ teraflops
I'd be quite surprised if they used anything other than GCN 1.2. It seems that AMD started working on NX's APU around October/November 2014, and the first GCN 1.2 based card (R9 285) had already been released a couple of months before, so the GCN 1.2 architecture would have been finished for some time. There's not much point throwing out advancements like color buffer compression if they're long finished.
I'd also err on the side of assuming that they'll use a lower GPU clock speed than Sony or MS, particularly if we're talking about a 28nm sub-XBO performance low power consumption device. Something like an XBO config at 700MHz may be plausible.
Any possibility that the NX platforms employ the new ASTC texture compression made by the Vulcan group (Khronos)?
An old article from 2012:
http://www.anandtech.com/show/6134/...s-30-opengl-43-astc-texture-compression-clu/4
I think this would be good to shave of a few % of the bandwidth on the handheld and console if they use the new one and ditch the old and archaic S3TC texture compression that has been around since 1999
I'm not sure if ASTC has or is likely to take off, but there's a reason the "archaic" S3TC has stuck around so long. There are plenty of newer algorithms which achieve more efficient compression, but they use up a lot more silicon to decompress (silicon which needs to be duplicated in each texture unit), and in general if you use that silicon for larger texture caches instead you actually end up getting better performance. For a texture compression algorithm to replace S3TC, it needs to outperform it in a similar silicon budget, which is a much bigger challenge than just outperforming it on a signal-to-noise measure.