The bolded is where the problem lies, nobody seems to be offering eDRAM on processes below 40nm (outside Intel and IBM). This means that, for a small, high-bandwidth pool of memory, their options are reduced to SRAM or HBM.
To illustrate why SRAM is unsuitable for a framebuffer at 28nm, just look at Xbox One and PS4. You've got two consoles with a roughly similar cost released at the same time, but one chooses a single pool of GDDR5 and the other split pools of SRAM and DDR3. MS hoped that using a small on-die pool of memory for the framebuffer, like Wii U or Xbox 360, would give them the best of both worlds, by providing the GPU the bandwidth it needs while cheap DDR3 allows them a large 8GB of main memory.
The results are now obvious. SRAM is big (it takes up far more die space than eDRAM) and therefore very expensive. They could only accommodate 32MB of it on the SoC, and even then, with a larger SoC than Sony, there was a lot less room left for the GPU. So, they ended up with an embedded pool that isn't large enough for a console targeting 1080p, and a GPU that's almost 30% less powerful than it would otherwise have been. Meanwhile, Sony upgraded PS4's memory to 8GB at the last minute, leaving MS without even an overall capacity advantage. Nintendo would have exactly the same problems if they tried to take the SRAM approach to split pools. There's no getting around the cost and the die area implications.
HBM is more of an unknown. It's obviously expensive, but on a per-MB basis much cheaper than SRAM. In theory a single 1GB stack of HBM1 would provide both the capacity (obviously) and the bandwidth necessary for a console competitive with PS4 when combined with some quantity of DDR3/4. That said, a large part of the cost of HBM is surely the packaging (similar to the reason Wii U's MCM is as expensive as it is). That packaging cost isn't any different from HBM1 to HBM2, and won't be all that much more for two or four stacks of memory than it would be for one. So, for all we know it may be 4GB or bust when it comes to using HBM.
I think the question comes down to how much total RAM Nintendo wants to go with. If it's 8GB or less, then I can't imagine a HBM+DDR3/4 approach being cheaper than GDDR5(X), or even LPDDR4, and either of the latter should provide enough bandwidth for a GCN 1.2 GPU. If they decide they need 12GB or more, then perhaps a small HBM pool plus a DDR3/4 pool might be the cheaper way to give themselves both the bandwidth and capacity they need.
An alternative, of course, is to replace the embedded memory pool with a large victim cache which acts as an L3 for both the CPU and GPU (like Apple uses on many of their SoCs). It doesn't need to be large enough to hold the entire framebuffer to significantly reduce main-memory bandwidth requirements, but its effectiveness depends largely on how the GPU accesses the framebuffer (which depends both on the hardware and the way programmers use it). The PowerVR GPUs used in Apple's chips are designed specifically to conserve bandwidth by using a tile-based rendering system to maximise the efficiency of a cache system like Apple uses. AMD's GCN is designed for desktop environments with high-bandwidth GDDR5, though, so it might require a bit of effort on the part of engine programmers to get good use out of any L3 cache.