With a screen resolution of 720p, I'm even less confident than I was before that there will be any difference in clock speeds between handheld mode and docked mode (and I wasn't particularly confident before). If they were to do so, though, with the goal to render at a higher resolution for the TV, then the logical approach would be to push as much of the additional thermal headroom towards the GPU, rather than the CPU (and potentially the RAM if they become bandwidth constrained).
Speaking of RAM, Maxwell/Pascal's use of tile-based rendering doesn't affect the quantity of RAM needed, but can reduce the bandwidth required (which would otherwise tend to be roughly linear in proportion to resolution). By increasing the amount of L2 cache available to the GPU (or perhaps implementing an L3 shared with the CPU), Nintendo and Nvidia could reduce bandwidth consumption, potentially by quite a lot. I actually just found out that TX1's GPU has just 256KB of L2 cache, which is lower than I would have expected. Desktop Maxwell and Pascal GPUs tend to have 1MB of L2 per 32 ROPs, whereas with 16 ROPs on the TX1 this puts it at half the proportional cache of its bigger brothers. I would have actually expected the situation to be reversed, as mobile SoCs are typically quite a bit more bandwidth constrained than desktop GPUs are.
In theory Nintendo could bump this quite a lot higher in order to keep as high a proportion of buffer accesses as possible on-die, reducing bandwidth consumption, saving energy and potentially saving money, if it allows them to get away with a narrower bus and fewer RAM modules. They were willing to drop 6MB of SRAM (directly accessible RAM, though, not a cache) on the 3DS SoC, so 1-2MB of L2 cache wouldn't be a big expense by their standard and should allow them to make very good use of Nvidia's tile-based rendering.