A large cache or embedded memory pool is definitely something to look out for. Nintendo has dedicated around 30% of their last few custom dies to embedded memory (3DS, Wii U CPU & GPU), so they're obviously happy about dropping down big pools of SRAM and eDRAM on their chips. Typically this is so that the bulk of the data accesses (primarily the framebuffer) can remain on-die; increasing bandwidth and reducing latency and power consumption. This is actually a potential insight into one of the reasons Nintendo have switched to Nvidia for NX, as we've recently learnt that Maxwell and Pascal implement tile-based rendering, which is intended to achieve pretty much the same thing by optimising framebuffer access patterns to maximise the proportion of them that hit cache rather than main memory. What this would mean from Nintendo's point of view is that they could achieve the same goal with a much smaller pool of memory (perhaps 4MB compared to 32MB) and could do so in a way which is invisible to developers, so they only have to manage a single memory pool. It actually wouldn't surprise me if they could get the same or better performance at lower cost and power consumption by combining a large cache with a 64 bit memory interface than a smaller cache and a 128 bit interface.