This gives a pretty good description of deferred rendering.
The issue with the Xbone is with the g-buffer. The effects you want to render, the larger the 'bytes per pixel' becomes. With all the effects they wanted, Infamous SS ended up with 40 bytes per pixel. Which for a 1920x1080 image requires 80MB for the g-buffer
If the resolution was 720p, the g-buffer would be 1280x720x40 = 35.1MB, so even then too big to fit into eSRAM. You could use the main DDR3 RAM, but that's slow as fuck so would have massive performance problems.
To get a g-buffer into eSRAM, you have to reduce both what is being rendered and the resolution. A very basic scene could output at native 1080p on the Xbone with deferred rendering. At most, you can have 16 byte per pixel in your g-buffer. But that's not enough with the expectations of next gen, with improved lighting, materials etc. In relation to PES and the Fox Engine, the lighting is the stand out improvement over last gen. This can only be achieved on the Xbone at lower resolutions.
Edit: all the talk about SDKs and 'closing the gap' is missing the point. The bottleneck isn't GPU performance (which the June update helped with by removing the 10% GPU reserved by Kinect). It's about the (lack of) RAM it in the Xbone's VRAM.