Seriously ?
First, there's barely any kind of tracking in that demo. All it does is 3D scanning and rendering from a different point of view. The only part you could associate to tracking is the hand detection used to trigger water generation, and that's a stretch (you don't even need to track the hand, just detect "something big enough above a given height"). And associating that to VR is even bigger of a stretch, it's like saying "it uses projective geometry, VR does that too !".
What it does prove is that AR isn't attached to the technology used. As long as you're displaying artificial information on a real object or the image of a real object, you're doing AR. Projecting images on sand is AR, so are drawing shapes on sport streams, outline targets in visor images or draw funny hats and moustaches on your video chat. You won't always need a headset, inertial sensors or even user detection to do AR.
As a side comment, it's also an example of how AR applications can be less sensitive to issues like resolution and latency. That system is obviously low resolution and high latency (limited by the kinect1, the projector and the whole setup that isn't optimized for speed), but it doesn't matter much in the end for what it's trying to do.