NVIDIA has given us a few answers to the question above. We have confirmation that the FP64 and FP16 rates are identical to GP104, which is to say very slow, and primarily there for compatibility/debug purposes. With the exception of INT8 support, this is a bigger GP104 throughout.
Meanwhile we have a die size for GP102: 471mm2, which is 139mm2 smaller than GP100. Given that both (presumably) have the same number of FP32 cores, the die space savings and implications are significant. This is as best of an example as we're ever going to get on the die space cost of the HPC features limited to GP100: NVLInk, fast FP64/FP16 support, larger register files, etc. By splitting HPC and graphics/inference into two GPUs, NVIDIA can produce GP102 at what should be a significantly lower price (and higher yield), something they couldn't do until the market for compute products based on GP100 was self-sustaining.
Finally, NVIDIA has clarified the branding a bit. Despite GeForce.com labeling it "the worlds ultimate graphics card," NVIDIA this morning has stated that the primary market is FP32 and INT8 compute, not gaming. Though gaming is certainly possible - and I fully expect they'll be happy to sell you $1200 gaming cards - the tables have essentially been flipped from the past Titan cards, where they were treated as gaming first and compute second. This of course opens the door to a proper GeForce branded GP102 card later on, possibly with neutered INT8 support to enforce the market segmentation.