• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

EDGE: "Power struggle: the real differences between PS4 and Xbox One performance"

astraycat

Member
But...

If the data is in a layout which is sub-optimal for either the CPU or the GPU it will be more optimal to use a move engine to swizzle the data, carry out other work on both to hide the latency then complete the processing you wanted to do on the data.

This makes the Xbox harder to program for in general but gives a good amount of scope for optimization.
Personally I don't see a lot of use for that, since textures can be swizzled into hardware tiling modes offline and just loaded directly into memory that way. There aren't a lot of algorithms on textures that call for linear CPU access, and most that do have GPU friendly versions anyway.

Unless this is more macro swizzling like copying from an interleaved buffer out to separate linear buffers (for better coalescing) or vice versa. That sounds like it could be useful.
 
Which was my point. Just comparing Tflops figures doesn't take into account a myriad of other things, just like it didn't with a 2Tflop PS3 compared to a 1Tflop 360.
They are different but not that different.
PS3 was a IBM single core dual thread with 7 CELL SPE CPU with a nVidia "old style" with discrete pixel and vertex shaders and discrete CPU/GPU memory.
XBox 360 was a IBM triple core (6 threads) with a unified shader architecture and unified memory architecture.

Now both PS4 and XBox one are almost THE SAME CPU wise, GPU is based on the same architecture and memory layout is similar.
 
They are different but not that different.
PS3 was a IBM single core dual thread with 7 CELL SPE CPU with a nVidia "old style" with discrete pixel and vertex shaders.
XBox 360 was a IBM triple core (6 threads) with a unified shader architecture.

Now both PS4 and XBox one are almost THE SAME CPU wise and GPU is based on the same architecture.

This.
 
actually:

http://www.anandtech.com/show/6976/...wering-xbox-one-playstation-4-kabini-temash/4

"While Kabini will go into more traditional notebook designs, Temash will head down into the tablet space. The Temash TDPs range from 3.9W all the way up to 9W. Of the three Temash parts launching today, two are dual-core designs with the highest end A6-1450 boasting 4 cores as well as support for turbo core. The A6-1450’s turbo core implementation also enables TDP sharing between the CPU and GPU cores (idle CPUs can be power gated and their thermal budget given to the GPU, and vice versa)."

It seems to be describing a Turbo mode, that is why IGN are saying 2.75 boost clock and 1.6 BASE clock, so it can upclock its cores IF all cores are not running... this seems legit??

2*2.75 = 5.5 = Sony's GDDR5 memory speed. Its easy to explain why the FCC has 2.75 as the max clock in the PS4. There's no strange "204? 218? Show me the receipts!" problem here.

If there's a turbo boost, cool, but the logical explanation is that its talking about the GDDR5.
 

Chobel

Member
It is not about computing power, but about having a much larger sample of data to derive AI profiles from. Instead of just having the driving profile of the "local" player you have the profiles of all players in the world to analyze. Computing power is not really important since the computation of profiles is not time-critical. You can perform it literally "over night", and thus downscale the amount of cloud-base compute resources greatly. The only thing that needs to scale on the backend side is, of course, the storage for profile information. The only thing that needs to scale with the number of users is the frontend that receives new data from individual players, but that is certainly something that only needs to be done, at max, once after each race. The latter is not more difficult than running a website.

I get that but Turn10 said that they're using 600% of Xbox one power in the cloud for each player. So even if Drivatar is updating once a day, you'll need a lot of resources.
 

Vizzeh

Banned
The FCC listed the highest frequency in the overall system, not specifically the frequency of a processor, hence,that 2,75Ghz refers to memory, WiFi, or something else among these things.

Yeah its referencing the RAM at 5.5GHz dual channel, but anatech are referencing a turbo mode in the jaguars. Perhaps IGN are seeing 2.75ghz and turbo mode and drawing parallels.
 
actually:

http://www.anandtech.com/show/6976/...wering-xbox-one-playstation-4-kabini-temash/4

"While Kabini will go into more traditional notebook designs, Temash will head down into the tablet space. The Temash TDPs range from 3.9W all the way up to 9W. Of the three Temash parts launching today, two are dual-core designs with the highest end A6-1450 boasting 4 cores as well as support for turbo core. The A6-1450’s turbo core implementation also enables TDP sharing between the CPU and GPU cores (idle CPUs can be power gated and their thermal budget given to the GPU, and vice versa)."

It seems to be describing a Turbo mode, that is why IGN are saying 2.75 boost clock and 1.6 BASE clock, so it can upclock its cores IF all cores are not running... this seems legit??

Seriously if sony had an turbo mode they would have said it because a 70% upclock in turbo mode would make me buy a ps4 it will have the performance i expected from next gen. Hell even an turbo mode giving 2.0/1.0 Ghz would made to get an ps4. But the reality is they dont have those clocks speeds. The upclock rumor is as stupid as dual apu or dual gpu rumors for the X1.
 

USC-fan

Banned
Right, but that's an extra on the Xbox. The DDR3 is coherent if you want it to be. The ESRAM should be used as a GPU scratch pad.

I can't imagine many scenarios where you'd want the CPU accessing it. That's the point of the move engines with their swizzling, to transfer data between optimum cache access layouts.

Move engine are just DMA. Standard part of gpu for many years. They are really nothing special.
 
Yeah its referencing the RAM at 5.5GHz dual channel, but anatech are referencing a turbo mode in the jaguars. Perhaps IGN are seeing 2.75ghz and turbo mode and drawing parallels.

Then IGN are wrong (not surprising). Yes, turbo modes do exist. Turbo modes do not come close to doubling clock speeds.
 

Slayer-33

Liverpool-2
Seriously if sony had an turbo mode they would have said it because a 70% upclock in turbo mode would make me buy a ps4 it will have the performance i expected from next gen. Hell even an turbo mode giving 2.0/1.0 Ghz would made to get an ps4. But the reality is they dont have those clocks speeds.

Why wouldn't you get one without it? lol
 

Fafalada

Fafracer forever
Vizzeh said:
It seems to be describing a Turbo mode, that is why IGN are saying 2.75 boost clock and 1.6 BASE clock, so it can upclock its cores IF all cores are not running... this seems legit??
It's legit part of PC counterparts, but it's questionable if this is predictable enough to be valuable in a console - ie. there's definitely always going to be titles that don't have all hw-threads under load, but if clock-scaling wasn't predictable that would cause more headaches then help.
 

bonus_sco

Banned
Personally I don't see a lot of use for that, since textures can be swizzled into hardware tiling modes offline and just loaded directly into memory that way. There aren't a lot of algorithms on textures that call for linear CPU access, and most that do have GPU friendly versions anyway.

Unless this is more macro swizzling like copying from an interleaved buffer out to separate linear buffers (for better coalescing) or vice versa. That sounds like it could be useful.

GPGPU algorithms can use "textures" as input too so that's where your second paragraph sort of comes in. Tiling/swizzling is what it can be used for, there will be algorithms which benefit from different access patterns on the CPU and GPU and "coding to the metal" for ultimate performance will mean doing this.

Better cache utilisation can also reduce memory bandwidth for a given algorithm because data is read into caches in lines. If you have spatial coherency, you'll have less memory reads during computation.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
I get that but Turn10 said that they're using 600% of Xbox one power in the cloud for each player. So even if Drivatar is updating once a day, you'll need a lot of resources.

That's bullshit from Turn10. It does not matter how much resources you need to process a single profile if that processing is not time-critical. I could use cloud-based resources equivalent to 120 Jaguar cores to process incoming queued data of all existing players and claim that I am using 15x the power of an XBO for each player's drivatar, when in fact I only have the equivalent of 120 Jaguar cores in total. There is no need to reserve those resources exclusively for individual players if I can process data asynchronously without hard time constraints.

For instance, Map-Reduce-based[1] frameworks for crunching through large data sets on distributed infrastructures (in the cloud) can employ very large numbers of servers as compute nodes, but they can also easily process many different tasks at the same time if you add a queueing frontend. Google pretty much works like this. They use something like >20 servers for each search request (can't find the source anymore), but that does not mean that Google has 20 servers for every user.

[1] http://en.wikipedia.org/wiki/MapReduce
 

spwolf

Member
Seriously if sony had an turbo mode they would have said it because a 70% upclock in turbo mode would make me buy a ps4 it will have the performance i expected from next gen. Hell even an turbo mode giving 2.0/1.0 Ghz would made to get an ps4. But the reality is they dont have those clocks speeds. The upclock rumor is as stupid as dual apu or dual gpu rumors for the X1.

turbo mode is there because in tablets/laptops you want to save battery.... For Sony, lower clock in PS4 would be because of heat.
 

flying dutchman

Neo Member
For me the power is important. I would have prefered both PS4 and XB1 to be $700 and be 4tflop beasts. I totally blame the wii for showing that console makers dont have to push the boundries of graphics to make a truck load of cash. Bastards.
 

Chobel

Member
That's bullshit from Turn10. It does not matter how much resources you need to process a single profile if that processing is not time-critical. I could use cloud-based resources equivalent to 120 Jaguar cores to process incoming queued data of all existing players and claim that I am using 15x the power of an XBO for each player's drivatar, when in fact I only have the equivalent of 120 Jaguar cores in total. There is no need to reserve those resources exclusively for individual players if I can process data asynchronously without hard time constraints.

For instance, Map-Reduce-based[1] frameworks for crunching through large data sets on distributed infrastructures (in the cloud) can employ very large numbers of servers as compute nodes, but they can also easily process many different tasks at the same time if you add a queueing frontend. Google pretty much works like this. They use something like >20 servers for each search request (can't find the source anymore), but that does not mean that Google has 20 servers for every user.

[1] http://en.wikipedia.org/wiki/MapReduce

I expected better from turn 10, I didn't think they'll use bullshit buzzwords too.

Anyway thanks for the clarification.
 
They have swizzle, lz compression and, jpeg encode and decode support too.

Edit: should be lz encode and decode and jpeg decode, I think.

#gamechanger ?

I get it, but don't get carried away with red herrings, most of this stuff is not uncommon in modern GPUs (UVD engine for video decode for example).
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Better cache utilisation can also reduce memory bandwidth for a given algorithm because data is read into caches in lines. If you have spatial coherency, you'll have less memory reads during computation.

That is indeed critical for GPUs with limited cache sizes (compared to CPUs) but I am not sure that you necessarily need to use a different physical memory layout for the CPU. The width of cache lines on modern GPUs is comparable to that of CPUs. Both Jaguar and GCN have a cache line width of 64 bytes.
 

Vizzeh

Banned
Isnt Turn 10 referencing 600% power in the cloud for "AI"

http://www.develop-online.net/news/44979/Turn-10-Xbox-One-cloud-can-boost-AI-processing-by-600

- I am completely against the cloud bs, but AI sounds feasible?

""So we can now make our AI instead of just being 20 per cent, 10 per cent of the box's capability, we can make it 600 per cent of the box's capability. Put it in the cloud and free up that 10 per cent or 20 per cent to make the graphics better - on a box that's already more powerful than we worked on before.""
 

bonus_sco

Banned
That is indeed critical for GPUs with limited cache sizes (compared to CPUs) but I am not sure that you necessarily need to use a different physical memory layout for the CPU. The width of cache lines on modern GPUs is comparable to that of CPUs. Both Jaguar and GCN have a cache line width of 64 bytes.

If you want to wring every last drop off performance out of the system then you'll want to have as good cache optimisation as you can get. If triggering a move engine to swizzle, carrying out other work then processing your algorithm gives better cache utilisation it should run faster, stall less, thrash the cache less and have less impact on other threads. This could give a higher instruction throughput on all threads on the processor.
 

bonus_sco

Banned
#gamechanger ?

I get it, but don't get carried away with red herrings, most of this stuff is not uncommon in modern GPUs (UVD engine for video decode for example).

No, it's not a game changer. It's important to get the best utilisation of the Xbox's resources and one of the reasons it's "harder" to program for.

As far as I'm aware the PS4 can't swizzle data in this way so developers just have to work out whether they would rather have compromised cache access on the CPU or GPU for certain algorithms which use the same data. There will be plenty of situations where it won't matter.
 
Why wouldn't you get one without it? lol

Because i have an capable gaming rig and don't hate mouse and keyboard.
Sony haven't showed enough games im interested in or have spoken of plans of making a ps4 an debug unit and release an native XNA kind of program.

So yeah paying $399 for console that will most likely gather dust for awhile is not worth it. Its an better choice to spend $399~499 on getting an new GPU which will be used daily either for programming or pc gaming related activities. Saving up money for ps4 is not that big of an issue and im pretty sure i will have one in 2014 or 2015.

I might even get an Wii U for windwaker remake or for the new zelda has been too long since i played the series. Majora mask and wind waker are the last ones i played.

The X1 im getting at launch because in my teens i was playing halo and gears those series have nostalgic values for me imo so i can't see myself missing an halo even if its just to bitch about it if it sucks. Big part of gaf may troll Ryse thread but the setting is pretty awesome and doesn't get exploited to much by gaming if you ask. Friends also bounced back to X1 after drm reversal.

And as a programmer that is interested in games technology Microsoft speaking about an native XNA like program is making me happy. Because with XNA i learned so much about programming and OOP design(maybe went a bit to pure/overboard with OOP).

tldr: because of reasons and next gen performance is worthless for me.


Isnt Turn 10 referencing 600% power in the cloud for "AI"

http://www.develop-online.net/news/44979/Turn-10-Xbox-One-cloud-can-boost-AI-processing-by-600

- I am completely against the cloud bs, but AI sounds feasible?

""So we can now make our AI instead of just being 20 per cent, 10 per cent of the box's capability, we can make it 600 per cent of the box's capability. Put it in the cloud and free up that 10 per cent or 20 per cent to make the graphics better - on a box that's already more powerful than we worked on before.""

Bringing it into perspective.
10~20% is like what 10~20 gflops on the cpu that means they will have like 60~120 gflops worth of offline performance in the cloud.
The ai they have is an Neural network. From what i vaguely remember is neural network needs data to work with if you have 5 millions player to get profiles from you get a lot of data to process offline.
When the neural network is done processing the next time the player connects it will get a new bunch of drivatars to race against.
 
No, it's not a game changer. It's important to get the best utilisation of the Xbox's resources and one of the reasons it's "harder" to program for.

As far as I'm aware the PS4 can't swizzle data in this way so developers just have to work out whether they would rather have compromised cache access on the CPU or GPU for certain algorithms which use the same data. There will be plenty of situations where it won't matter.

Well, so you have a smaller GPU with a slower memory, a fast scratchpad and swizzle engines.
Working REALLY HARD they'll probably close the gap between DDR3 and GDDR5 (at the expense of a really complex driver), but it still remain a smaller GPU with less shaders, half the ROPs and the rest.
 

bonus_sco

Banned
Well, so you have a smaller GPU with a slower memory, a fast scratchpad and swizzle engines.
Working REALLY HARD they'll probably close the gap between DDR3 and GDDR5, but it still remain a smaller GPU with less shaders, half the ROPs and the rest.

Swizzling has little to do with closing the bandwidth gap between DDR3 and GDDR5, I've no idea why you think it does.

The move engines let the system convert data into different layouts for better cache utilisation on the CPU and GPU. This saves bandwidth over unoptimised data as a side effect of reducing cache misses. The main aim of reducing cache misses is to decrease the amount of time spent waiting on the processor fetching data from memory. All processors can compute faster than memory can serve data which is why caches are used.
 

Vizzeh

Banned
Bringing it into perspective.
10~20% is like what 10~20 gflops on the cpu that means they will have like 60~120 gflops worth of offline performance in the cloud.
The ai they have is an Neural network. From what i vaguely remember is neural network needs data to work with if you have 5 millions player to get profiles from you get a lot of data to process offline.
When the neural network is done processing the next time the player connects it will get a new bunch of drivatars to race against.

I can definitely see the benefit of that, however it seems like Turn10 are painting a picture that points towards X1 gaining CPU resources by that, surely it consumes more to send the data to the servers and then gain that data back and use it.

It seems their suggesting the alternative was a neural x1 that did the processing itself for the drivavatars - learning how you play itself, instead of the servers (which wouldnt have been the case, it would be just preloaded drivatars?) So their taking a phantom CPU save and saying they will make better graphics with it? (cant see how it can be that MUCH better graphics if the cpu is doing it anyway)

The cloud has definate advantages, I hope they are not ruining something that will be useful by overshadowing it with BS saying it can make your console 3x more powerful etc.
 

alterego

Junior Member
A 50% graphics advantage to PS4 is huge.

VrdiyJt.jpg
 

Chobel

Member
Swizzling has little to do with closing the bandwidth gap between DDR3 and GDDR5, I've no idea why you think it does.

The move engines let the system convert data into different layouts for better cache utilisation on the CPU and GPU. This saves bandwidth over unoptimised data as a side effect of reducing cache misses. The main aim of reducing cache misses is to decrease the amount of time spent waiting on the processor fetching data from memory. All processors can compute faster than memory can serve data which is why caches are used.

It's for saving bandwidth, PS4 can do swizzling too... Not dedicated hardware, but it can be done. You'll just have two memory addresses swizzled data and not swizzled data.
 
I can definitely see the benefit of that, however it seems like Turn10 are painting a picture that points towards X1 gaining CPU resources by that, surely it consumes more to send the data to the servers and then gain that data back and use it.

It seems their suggesting the alternative was a neural x1 that did the processing itself for the drivavatars - learning how you play itself, instead of the servers (which wouldnt have been the case, it would be just preloaded drivatars?) So their taking a phantom CPU save and saying they will make better graphics with it? (cant see how it can be that MUCH better graphics if the cpu is doing it anyway)

The cloud has definate advantages, I hope they are not ruining something that will be useful by overshadowing it with BS saying it can make your console 3x more powerful etc.

My guess is that searching an AI state graph is cheaper to do then to analyze the scene and then calculate the actions the ai has to take. Hence freeing up some performance but nothing is holding back PD or other too also do this offline processing of player racing behavior profiles.

It should be interesting what 343 will do with the cloud with halo 5 from what i gathered nvidia cheapest GI cloud solution is based on some research/paper an 343 engineer has written if im not mistaken.
 
from the guy that announced that "emperor is naked" (GT)?

hah.

Yep. There was a considerable amount of pettiness towards GT from Turn 10 this gen like that. Obviously it's not uncommon in this industry to see such things, but with that in mind I don't see why anyone would really expect 'better' from the likes of Turn 10.
 

Chobel

Member
It should be interesting what 343 will do with the cloud with halo 5 from what i gathered nvidia cheapest GI cloud solution is based on some research/paper an 343 engineer has written if im not mistaken.

They'll be stupid if they did that, that means Halo 5 must played online and only people with good, really good internet will get the best quality.
 
Swizzling has little to do with closing the bandwidth gap between DDR3 and GDDR5, I've no idea why you think it does.

The move engines let the system convert data into different layouts for better cache utilisation on the CPU and GPU. This saves bandwidth over unoptimised data as a side effect of reducing cache misses. The main aim of reducing cache misses is to decrease the amount of time spent waiting on the processor fetching data from memory. All processors can compute faster than memory can serve data which is why caches are used.

Different words, same stuff... memory appears more responsive, less miss prone and overall "better" with move engines = closes the gap between DDR3 and GDDR5.
Good.

Still a lower spec GPU.
 

Vizzeh

Banned
Swizzling has little to do with closing the bandwidth gap between DDR3 and GDDR5, I've no idea why you think it does.

The move engines let the system convert data into different layouts for better cache utilisation on the CPU and GPU. This saves bandwidth over unoptimised data as a side effect of reducing cache misses. The main aim of reducing cache misses is to decrease the amount of time spent waiting on the processor fetching data from memory. All processors can compute faster than memory can serve data which is why caches are used.

GPU's also seem to serve better function by doing this with the customization of queues PS4= 8 ACES * 8 Compute Queues = 64 Compute Queues total vs X1 - 2 ACES * 2 Compute Queues = 4 Compute Queues total. 64v8 is surely huge on the GPU, as importantly as the CPU id guess especially as the generation matures and takes more benefit from GPGPU.
 

JaggedSac

Member
But shit, even using Forza 5 as the example, wouldn't that prove CLOUD isn't doing shit? It's 60fps, yes, but it is clearly sacrificing on the visual spectrum in many ways: it lacks dynamic lighting, its reflections are the cheap, as-seen-in-PS360-gen type, tons of things it's simply not doing that even DriveClub is.

So, can he point to what areas specifically he thinks benefited from POWER OF THE CLOUD™? Because if anything this would prove that CLOUD is not doing much to help

I don't think there are any graphical things going on in the cloud in Forza 5. Only the AI stuff. And that isn't using the cloud in real time.
 

bonus_sco

Banned
It's for saving bandwidth, PS4 can do swizzling too... Not dedicated hardware, but it can be done. You'll just have two memory addresses swizzled data and not swizzled data.

You'll have two addresses on the Xbox One too.

I'm not sure what the benefit of swizzling on the CPU or GPU would be on the PS4. The benefit on the Xbox is that you have dedicated hardware to do it via DMA and you can carry on doing other things while it's busy. Reading and writing the data on the CPU or GPU would likely thrash the data cache which is what you'd be trying to avoid.
 

doemaaan

Member
Can somebody post the link to the thread where someone has an article/blog (can't remember which) saying not to trust EDGE and their claim that the ps4 is more powerful? I forgot to bookmark and now I can't find it.
 

bonus_sco

Banned
GPU's also seem to serve better function by doing this with the customization of queues PS4= 8 ACES * 8 Compute Queues = 64 Compute Queues total vs X1 - 2 ACES * 2 Compute Queues = 4 Compute Queues total. 64v8 is surely huge on the GPU, as importantly as the CPU id guess especially as the generation matures and takes more benefit from GPGPU.

This has nothing to do with L1 data and texture cache misses.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Reading and writing the data on the CPU or GPU would likely thrash the data cache which is what you'd be trying to avoid.

I still don't see how this would thrash caches if the sizes of the cache lines in Jaguar and GCN are the same. The granularity just is the same.
 
They'll be stupid if they did that, that means Halo 5 must played online and only people with good, really good internet will get the best quality.

Packages dont have too that big off course depending on implementation and it you dont have to always update the GI with every frame. You could it for every 2 degrees the local star in the system moves hell you can even say that an day and night cycle last longer.

People that don't have enough bandwidth will have no or worse GI solution.
A first party studio has those options.
 
Top Bottom