Support NeoGAF

dr. apocalipsis · Jul 15, 2013

badb0y said:
eDRAM is only available on their highest performance iGPU(GT3e) that we are comparing to Richland/Trinity which are mid-range parts. Like I said Kaveri which is more competitive with i7-4x50HQ should wipe the floor when it comes to GPU performance.

Nope, they are caused by the lack of frame-pacing/metering. Kepler has frame-metering built into the silicon that's why micro-stuttering isn't a big issue with their cards and AMD has already acknowledged the issue. In fact we should get a driver this month that fixes the micro-stuttering issue which is again a multi-GPU problem and I don't know why you keep bringing that up.

Once again you seem to gloss over the reason why embedded RAM was incorporated.

This is information coming straight from Intel. iGPUs can't run on solely off of DDR3 RAM because they will be bandwidth starved....this isn't even up for discussion, this is a fact. eDRAM is implemented to help with the bandwidth issues, the low latency is just a side affect of it being on die.

I'm done here. You sound like a car dealer changing subject everytime I find a flaw in your speech.

SenjutsuSage said:
Also, if proper 6T-SRAM, which is what the XB1's ESRAM is highly speculated to be, then you could be looking at a latency of anywhere from 10-15 cycles give or take. It could be slightly higher, I really have no idea, but we are more than likely talking in the right range even if my specific number is off. I know people hate this, but I'm going to post a comment from a game developer, and I don't care how offended people are by my attempt to use an actual game creator's comments to support what I'm saying.

That figure is pretty unrealistic. Latency should be in between L2 and main RAM, not better than L2. Triple than what you suggest would be awseome for One.

ElTorro said:
I am pretty sure that bandwidth is the only really relevant reason why embedded memory exists in GPUs. For instance, you can concurrently fetch textures from main memory while drawing pixels to a render target in the eSRAM without interference between the two. However, these scenarios are limited by the eSRAM's size. Many people mention many possible scenarios but forget that you can't implement them all at the same time. Without knowing the concrete numbers, I am pretty sure that the eSRAM is not big enough to be usable as a texture cache and for render targets at the same time. For instance, the g-buffers of Killzone Shadow Fall already eat up 40MB of RAM and their backbuffers eat additional 31 MB. That is just not possible on the eSRAM, even without storing any textures in it.

Don't you believe eSRAM here will going to be used as a next level cache?

As I see this, it won't be used to store buffers such as PS2 or 360, but as a cache, and in that scenario 32mb are more than enough for a 95% hit rate, Intel claims. People is forgetting than DDR3 in One is running at 2100 mhz, and that is more than enough to not bottleneck a mid tier GPU, like the APU of One, and to handle buffers, more when you have included more logic to move data.

I'm looking at One diagrams, but they aren't too detailed. Can CPU use this eSRAM too in an useful way? Having that pool of cache may lead to minor improvements on the CPU's IPC, starved like it is with 2MB of L2.

stryke said:
I don't think they're on the same die though.

They aren't on the same DIE for fab convenience. That way they can produce both 5200 or 4600 in the same line, then choose what to mount. And, in the future, they still can fab the edram in the old node.

phosphor112 · Jul 15, 2013

dr. apocalipsis said:
Don't you believe eSRAM here will going to be used as a next level cache?

As I see this, it won't be used to store buffers such as PS2 or 360, but as a cache, and in that scenario 32mb are more than enough for a 95% hit rate, Intel claims. People is forgetting than DDR3 in One is running at 2100 mhz, and that is more than enough to not bottleneck a mid tier GPU, like the APU of One, and to handle buffers, more when you have included more logic to move data.

I'm looking at One diagrams, but they aren't too detailed. Can CPU use this eSRAM too in an useful way? Having that pool of cache may lead to minor improvements on the CPU's IPC, starved like it is with 2MB of L2.

You're saying their intended use for the eSRAM is a cache for the CPU?

Why? Why wouldn't they make the eSRAM directly on the CPU instead, instead of being connected to the GPU, and having to go through the North Bridge as a potential bottleneck to access the CPU? Can they use it for the CPU? Yes, of course, but you're suggesting they'd use it for something that their hardware layout doesn't seem to agree with.

Also, their L2 has 4MB. It has 2MB per Jaguar module.

EDIT: Here is the diagram

dr. apocalipsis · Jul 15, 2013

phosphor112 said:
You're saying their intended use for the eSRAM is a cache for the CPU?

Why? Why wouldn't they make the eSRAM directly on the CPU instead, instead of being connected to the GPU, and having to go through the North Bridge as a potential bottleneck to access the CPU? Can they use it for the CPU? Yes, of course, but you're suggesting they'd use it for something that their hardware layout doesn't seem to agree with.

Also, their L2 has 4MB. It has 2MB per Jaguar module.

EDIT: Here is the diagram

Those jaguar cores are 2 cpu's glued together, just like old Core2Quads. Those Core2Quad needed to go through NB to access the other pair L2, I don't know how AMD solved that in this APU. Even in that scenario, it was faster than access main memory.

Im not saying that eSRAM is intended for CPU, but that it might also help with CPU performance, besides helping with CPU-GPU comunication through the shared pool of cache. Afterall, 360 GPU was able to read directly from CPU L2. That was one of the improvements over the PC PCIE interface. That diagrams doesn't provide enough data.

phosphor112 · Jul 15, 2013

dr. apocalipsis said:
Those jaguar cores are 2 cpu's glued together, just like old Core2Quads. Those Core2Quad needed to go through NB to access the other pair L2, I don't know how AMD solved that in this APU. Even in that scenario, it was faster than access main memory.

Im not saying that eSRAM is intended for CPU, but that it might also help with CPU performance, besides helping with CPU-GPU comunication through the shared pool of cache. Afterall, 360 GPU was able to read directly from CPU L2. That was one of the improvements over the PC PCIE interface. That diagrams doesn't provide enough data.

You have to keep in mind that latency isn't a big problem with lower clocked cores. If it were clocked at something like 3.2ghz, yes, latency would cause big problems. The frequency of these things along with the latency provided by the DDR3 (or GDDR5 for the PS4) should suffice. I don't see something like the eSRAM providing MUCH performance boost, if any for those CPUs.

dr. apocalipsis · Jul 15, 2013

Probably I didn't explained well my point.

I can see that both designs could suffer easily from CPU bottlenecks with such weaks cores, more in the case of PS4 with a beefier GPU. What I'm talking about is that maybe they can use that shared cache to reduce GPU stalls when waiting to CPU, bypassing main memory. Not as a main objetive to eSRAM, but as an added benefit.

In my head it makes sense to try to milk every CPU cycle being that CPU so slow. And I still think that eSRAM in One isn't there to hold framebuffer. That it isn't just a local video memory for GPU.

ChorusLindarr · Jul 15, 2013

dr. apocalipsis said:
Probably I didn't explained well my point.

I can see that both designs could suffer easily from CPU bottlenecks with such weaks cores, more in the case of PS4 with a beefier GPU. What I'm talking about is that maybe they can use that shared cache to reduce GPU stalls when waiting to CPU, bypassing main memory. Not as a main objetive to eSRAM, but as an added benefit.

In my head it makes sense to try to milk every CPU cycle being that CPU so slow. And I still think that eSRAM in One isn't there to hold framebuffer. That it isn't just a local video memory for GPU.

It isn't, otherwise they could have used EDRAM and spun it off to a daughter die with the ROPs, just like with the 360 design.

I think the point of the ESRAM was to provide a scratchpad with sufficiently low latency to make it useful as an unmanaged next-level cache for better GPU utilisation, AND to provide sufficient bandwidth such that it could be used as a framebuffer if developers so wished to use it so.

The issue I have with many who don't seem to understand thing properly is that they seem to think (or rather imply) that it will be able to provide both services concurrently, which is pretty impossible really.

The issue is, for any XB1 or MP developer you need to make a decision for how you are going to use the embedded memory pool; either as a framebuffer or low latency scratchpad. With both cases presenting two different sets of issues and drawbacks, as well as benefits depending on your usage case. Crudely you'll either come up against issues with the size of the pool, or you'll hit bandwidth and fillrate limits being forced to use main memory for your framebuffer and MRTs.

phosphor112 · Jul 15, 2013

dr. apocalipsis said:
Probably I didn't explained well my point.

I can see that both designs could suffer easily from CPU bottlenecks with such weaks cores, more in the case of PS4 with a beefier GPU. What I'm talking about is that maybe they can use that shared cache to reduce GPU stalls when waiting to CPU, bypassing main memory. Not as a main objetive to eSRAM, but as an added benefit.

In my head it makes sense to try to milk every CPU cycle being that CPU so slow. And I still think that eSRAM in One isn't there to hold framebuffer. That it isn't just a local video memory for GPU.

I suppose that makes sense.

FINALBOSS · Jul 15, 2013

SenjutsuSage said:
That's interesting, got a link for that info? I'd love to see it. 300+ cycles to the L1 caches?

In fairness, threads move way too fast on here to catch every single post, especially when I'm largely ignoring the people being immature and just tossing out insults because I said something they perceive as a slight against their preferred console. It's no surprise that when I saw a more reasonable post such as Torro's, I gave it a proper response, because it's one of the few that deserves a response in threads like these nowadays. The large majority of posts tend to be insults and not more reasonable attempts at discussion. If that's not the case, then it just seems like it fairly often. Once I get the sense that a thread is turning into a cesspool of insult after insult, I just stop bothering with the thread and move on. Also, I kinda learned my lesson after my first ever ban prior to E3. You gotta know when to pull the plug. Say your piece and then move on.

No one is tossing out slight insults. And no one is tossing out these perceived slight insults because you said something bad about their preferred console. You're getting negative responses because you are posting 100% grade A bullshit.

JonathanPower · Jul 15, 2013

dr. apocalipsis said:
Probably I didn't explained well my point.

I can see that both designs could suffer easily from CPU bottlenecks with such weaks cores, more in the case of PS4 with a beefier GPU. What I'm talking about is that maybe they can use that shared cache to reduce GPU stalls when waiting to CPU, bypassing main memory. Not as a main objetive to eSRAM, but as an added benefit.

In my head it makes sense to try to milk every CPU cycle being that CPU so slow. And I still think that eSRAM in One isn't there to hold framebuffer. That it isn't just a local video memory for GPU.

How so? I can't see how a 100Glop CPU can be the bottleneck here, especially considering that jaguar cores are much better than Piledriver cores. If anything, this is the first time that consoles have a proper x86 cpu.

http://beyond3d.com/showpost.php?p=1715619&postcount=155

Piledriver core (CPU used in Trinity) is designed for high clock speeds (turbo up to 4.3 GHz, overclock up to 8 GHz). In order to reach such high clocks, several sacrifices had to be made. The CPU pipelines had to be made longer, because there's less time to finish each pipeline stage (as clock cycles are shorter). The cache latencies are longer, because there's less time to move data around the chip (during a single clock cycle). The L1 caches are also simpler (less associativity) compared to Jaguar (and Intel designs). In order to combat the IPC loss of these sacrifices, some parts of the chip needed to be beefed up: The ROBs must be larger (more TLP is required to fill longer pipelines, more TLP is required to hide longer cache latencies / more often occurring misses because of lower L1 cache associativity) and the branch predictor must be better (since long pipeline causes more severe branch mis-predict penalties). All these extra transistors (and extra power) are needed just to negate the IPC loss caused by the high clock headroom.

Jaguar compute unit (4 cores) has same theoretical peak performance per clock as a two module (4 core) Piledriver. Jaguar has shorter pipelines and better caches (less latency, more associativity). Piledriver has slightly larger ROBs and slightly better branch predictor. But these are required to negate the disadvantages in the cache design and the pipeline length. The per module shared floating point pipeline in Piledriver is very good for single threaded tasks, but for multithreaded workloads, the module design is a hindrance, because of various bottlenecks (shared 2-way L1 instruction cache and shared instruction decode). Steamroller will solve some of these bottlenecks (before end of this year?), but it's still too early to discuss about it yet (with the limited information available).

Jaguar and Piledriver IPC will be in the same ballpark. However when running these chips at low clocks (<19W) all the transistors spent in Piledriver design that allow the high clock ceiling are wasted, but all the disadvantages are still present. Thus Piledriver needs more power and more chip area to reach similar performance than Jaguar. There's no way around this. Jaguar core has better performance per watt.

JohnnySasaki86 · Jul 15, 2013

dr. apocalipsis said:
Those jaguar cores are 2 cpu's glued together, just like old Core2Quads. Those Core2Quad needed to go through NB to access the other pair L2, I don't know how AMD solved that in this APU. Even in that scenario, it was faster than access main memory.

Im not saying that eSRAM is intended for CPU, but that it might also help with CPU performance, besides helping with CPU-GPU comunication through the shared pool of cache. Afterall, 360 GPU was able to read directly from CPU L2. That was one of the improvements over the PC PCIE interface. That diagrams doesn't provide enough data.

That diagram right there implies the CPU wont be accessing the eSRAM. The GPU has direct access, the CPU doesn't. The eSRAM is there for the GPU not the CPU. The DDR3 is more than sufficient for the CPU. I thought I remember reading in the VGLeaks docs that the CPU couldn't even access the eSRAM if it wanted to.

badb0y · Jul 15, 2013

JonathanPower said:
How so? I can't see how a 100Glop CPU can be the bottleneck here, especially considering that jaguar cores are much better than Piledriver cores. If anything, this is the first time that consoles have a proper x86 cpu.

http://beyond3d.com/showpost.php?p=1715619&postcount=155

Jaguar cores are weaker than Piledriver cores...

Jaguar is built for low power notebooks/netbooks while Piledriver micro-architecture is used mainly for desktops.

dr. apocalipsis said:
I'm done here. You sound like a car dealer changing subject everytime I find a flaw in your speech.

That figure is pretty unrealistic. Latency should be in between L2 and main RAM, not better than L2. Triple than what you suggest would be awseome for One.

Don't you believe eSRAM here will going to be used as a next level cache?

As I see this, it won't be used to store buffers such as PS2 or 360, but as a cache, and in that scenario 32mb are more than enough for a 95% hit rate, Intel claims. People is forgetting than DDR3 in One is running at 2100 mhz, and that is more than enough to not bottleneck a mid tier GPU, like the APU of One, and to handle buffers, more when you have included more logic to move data.

I'm looking at One diagrams, but they aren't too detailed. Can CPU use this eSRAM too in an useful way? Having that pool of cache may lead to minor improvements on the CPU's IPC, starved like it is with 2MB of L2.

They aren't on the same DIE for fab convenience. That way they can produce both 5200 or 4600 in the same line, then choose what to mount. And, in the future, they still can fab the edram in the old node.

So, once you are caught with your pants down you resort to childish retorts?

Bolded is so wrong it's embarrassing, the iGPUs in the Xbox One and PS4 are the strongest that have ever been put on the same die as the CPU. Nothing Intel or AMD offers right now even comes close to the performance of the Xbox One or PS4's APU when it comes to the iGPU.

diehard · Jul 15, 2013

JonathanPower said:
. If anything, this is the first time that consoles have a proper x86 cpu.

The original xbox was definitely a proper x86 CPU.

JonathanPower · Jul 15, 2013

badb0y said:
Jaguar cores are weaker than Piledriver cores...

Jaguar is built for low power notebooks/netbooks while Piledriver micro-architecture is used mainly for desktops.

No they are not.

http://www.neogaf.com/forum/showpost.php?p=50472948&postcount=1417

Rolf NB said:
Jaguar actually has twice the SIMD throughput per clock and core when compared to Bulldozer. A Jaguar core has the same amount of SIMD resources as a Bulldozer module (two cores with a shared FP unit).

dIEHARD said:
The original xbox was definitely a proper x86 CPU.

Yeah you are right, I forgot about the original Xbox.

charsace · Jul 15, 2013

dIEHARD said:
The original xbox was definitely a proper x86 CPU.

Yep. One reason I'm hoping that the X1 can emulate the original xbox. MS can write an emulator if they want to.

badb0y · Jul 15, 2013

JonathanPower said:
No they are not.

http://www.neogaf.com/forum/showpost.php?p=50472948&postcount=1417

Yeah you are right, I forgot about the original Xbox.

Yes they are.
http://www.anandtech.com/show/6976/...powering-xbox-one-playstation-4-kabini-temash

The average number of instructions executed per clock (IPC) is still below 1 for most client workloads. There’s a certain amount of burst traffic to be expected but given the types of dependencies you see in most use cases, AMD felt the gain from making the machine wider wasn’t worth the power tradeoff. There’s also the danger of making the cat-cores too powerful. While just making them 3-issue to begin with wouldn’t dramatically close the gap between the cat-cores and the Bulldozer family, there’s still a desire for there to be clear separation between the two microarchitectures.

IIRC Piledriver cores have 15% IPC improvement over Bulldozer.

DJ Lushious · Jul 15, 2013

FINALBOSS said:
No one is tossing out slight insults. And no one is tossing out these perceived slight insults because you said something bad about their preferred console. You're getting negative responses because you are posting 100% grade A bullshit.

Your posts aren't any better. Why so much vitriol?

JonathanPower · Jul 15, 2013

badb0y said:
Yes they are.
http://www.anandtech.com/show/6976/...powering-xbox-one-playstation-4-kabini-temash

IIRC Piledriver cores have 15% IPC improvement over Bulldozer.

Again, Jaguar has twice the SIMD throughput than Piledriver. I mean, this is not an opinion; this is a fact.

FINALBOSS · Jul 15, 2013

DJ Lushious said:
Your posts aren't any better. Why so much vitriol?

I can live with that. I'm also not muddying the waters with false technical details that people have debunked.

dr. apocalipsis · Jul 15, 2013

JonathanPower said:
How so? I can't see how a 100Glop CPU can be the bottleneck here, especially considering that jaguar cores are much better than Piledriver cores. If anything, this is the first time that consoles have a proper x86 cpu.

http://beyond3d.com/showpost.php?p=1715619&postcount=155

Piledriver, as bad as it is, is the performance core from AMD, Jaguar is the budget one. They are even crappier than what Xenon/Cell were at 2006 and a huge dissapointment to most of us, hardware enthusiast. You should also stop measuring CPUs using Gflops. Then take into account that as pretty as that figure is, it is theorized multiplying x8. Single thread performance of that CPU cores is abysmal.

Those cores are being used because of the fail of AMD with Bulldozer/Piledriver power/performance ratio, nothing else.

JohnnySasaki86 said:
That diagram right there implies the CPU wont be accessing the eSRAM. The GPU has direct access, the CPU doesn't. The eSRAM is there for the GPU not the CPU. The DDR3 is more than sufficient for the CPU. I thought I remember reading in the VGLeaks docs that the CPU couldn't even access the eSRAM if it wanted to.

That diagram doesn't imply that. In fact, it doesn't say anything concise. That's why I want a better schematic about One. If you can provide any insightful document about the (un)ability of CPU to access eSRAM, I would be grateful.

badb0y said:
So, once you are caught with your pants down you resort to childish retorts?

Bolded is so wrong it's embarrassing, the iGPUs in the Xbox One and PS4 are the strongest that have ever been put on the same die as the CPU. Nothing Intel or AMD offers right now even comes close to the performance of the Xbox One or PS4's APU when it comes to the iGPU.

What nonsenses are you talking now? Both GPU's are unable to compete with discrete mid-end offerings from both AMD/Nvidia. As convenient as they are for consoles, they lack the performance of desktop parts.

I'm not interested into your straw-man arguments. When I said eSRAM improves GPUs performance you replied than AMD will release Kaveri and wipe the floor with Intel (SIC). That after provide a benchmark using Intel parts without eSRAM. Something totally unrelated with my affirmation.

Now I defend that DDR3 @2100 Mhz RAM is enough for One's GPU, and you come with idiocies about how wonderful APUs are. Maybe you are smarter than engineers at MS/AMD and can tell them that they are castrating their chips using an unsuitable memory system. Maybe you are in time to save them billions in R&D.

With or without your approval, One GPU will work with DDR3.

diehard · Jul 15, 2013

charsace said:
Yep. One reason I'm hoping that the X1 can emulate the original xbox. MS can write an emulator if they want to.

OG xbox was such a beast for the price.. Pentium 3 (although with less cache) and a GF3 would be a legit great gaming PC when it was released.

dr. apocalipsis · Jul 15, 2013

JonathanPower said:
Again, Jaguar has twice the SIMD throughput than Piledriver. I mean, this is not an opinion; this is a fact.

And?

Bulldozer modules share a lot of resources for every 2 integer cores, it's part of that philosophy. Even then, they are much more capable than those included in Jaguar architecture. Basically, you are telling something like that the quad core ARM from a Galaxy S3 is stronger that any Intel i3 because the latter have only 2 cores.

ElTorro · Jul 15, 2013

dr. apocalipsis said:
Don't you believe eSRAM here will going to be used as a next level cache?

We don't know if the eSRAM can work as a hardware-managed cache or not. Nevertheless, the question of whether it is a hardware-managed cache or not does not imply whether it'll be most helpful in storing buffers or not. These are two rather separate questions.

If it is not hardware-managed, using it explicitly for pixel buffers would make sense because this usage pattern fits nicely with many work patterns of a graphics pipeline: you read data from main memory and write results (pixels) to eSRAM simultaneously on two independent buses.

badb0y · Jul 15, 2013

dr. apocalipsis said:
What nonsenses are you talking now? Both GPU's are unable to compete with discrete mid-end offerings from both AMD/Nvidia. As convenient as they are for consoles, they lack the performance of desktop parts.

Spec-wise the Xbox One's GPU is around HD 7770 and the PS4's GPU is around HD 7850/7870.

I'm not interested into your straw-man arguments. When I said eSRAM improves GPUs performance you replied than AMD will release Kaveri and wipe the floor with Intel (SIC). That after provide a benchmark using Intel parts without eSRAM. Something totally unrelated with my affirmation.

You are attributing to the performance boost in those chips as a biproduct of eDRAM when in reality GT3e simply has more EUs. Do you think that doubling the EU cores has no performance boost? Another thing to note is you are comparing different product stacks, Iris Pro does not compete with Richland just as Bulldozer does not compete with Atom.
Also, I never said eSRAM and eDRAM won't have a positive affect on performance, I am saying the reason why eSRAM and eDRAM provide a performance boost is because they help make up for the crappy memory bandwidth of DDR3, not some latency-laden magical pixy dust you and Senjutssage keep talking about.

Now I defend that DDR3 @2100 Mhz RAM is enough for One's GPU, and you come with idiocies about how wonderful APUs are. Maybe you are smarter than engineers at MS/AMD and can tell them that they are castrating their chips using an unsuitable memory system. Maybe you are in time to save them billions in R&D.

With or without your approval, One GPU will work with DDR3.

It's not enough, that's why ESRAM was implemented. Have you learned nothing? APUs are bandwidth starved especially the ones used in the PS4 and Xbox One.

Also I noticed you asked for a better schematic for the Xbox One's memory System:

JonathanPower · Jul 15, 2013

dr. apocalipsis said:
Piledriver, as bad as it is, is the performance core from AMD, Jaguar is the budget one. They are even crappier than what Xenon/Cell were at 2006 and a huge dissapointment to most of us, hardware enthusiast. You should also stop measuring CPUs using Gflops. Then take into account that as pretty as that figure is, it is theorized multiplying x8. Single thread performance of that CPU cores is abysmal.

Those cores are being used because of the fail of AMD with Bulldozer/Piledriver power/performance ratio, nothing else.

Ok, so basically what you are saying is that we should just ignore the specs, the performance/Watt ratio, the multithreaded perfomance, and then conclude that Jaguar is a weak CPU.

dr. apocalipsis said:
And?

Bulldozer modules share a lot of resources for every 2 integer cores, it's part of that philosophy. Even then, they are much more capable than those included in Jaguar architecture. Basically, you are telling something like that the quad core ARM from a Galaxy S3 is stronger that any Intel i3 because the latter have only 2 cores.

And given the TDP budget of the PS4, there is no CPU that can provide the same perfomance of Jaguar.

badb0y · Jul 15, 2013

JonathanPower said:
Ok, so basically what you are saying is that we should just ignore the specs, the performance/Watt ratio, the multithreaded perfomance, and then conclude that Jaguar is a weak CPU.

The only way you can conclude that Jaguar is better than Piledriver is if you ignore the facts.

astraycat · Jul 15, 2013

badb0y said:
Also I noticed you asked for a better schematic for the Xbox One's memory System:

How interesting. The 68GB/s figure to main memory is the combination of the direct 42GB/s bus to main memory and the coherent 25GB/s bus over the northbridge?

ElTorro · Jul 15, 2013

astraycat said:
How interesting. The 68GB/s figure to main memory is the combination of the direct 42GB/s bus to main memory and the coherent 25GB/s bus over the northbridge?

Yeah, compared to PS4's memory system which apparently provides the GPU with the entire theoretical maximum of the main memory bus.

nib95 · Jul 15, 2013

astraycat said:
How interesting. The 68GB/s figure to main memory is the combination of the direct 42GB/s bus to main memory and the coherent 25GB/s bus over the northbridge?

Seems like it. I didn't even realise this until today.

ChorusLindarr · Jul 15, 2013

JonathanPower said:
Ok, so basically what you are saying is that we should just ignore the specs, the performance/Watt ratio, the multithreaded perfomance, and then conclude that Jaguar is a weak CPU.

And given the TDP budget of the PS4, there is no CPU that can provide the same perfomance of Jaguar.

Actually Mr Power, your opponent is correct on this one.

SIMD performance is not important when looking at CPUs. SIMD stands for Single Instruction Multiple Data, and is the same processing philosophy that GPUs are designed for. CPUs are not designed to be great SIMD cores, only to have reasonable enough SIMD performance which comes from their VMX/AVX/AVX2 extensions. It's vector performance and the type of workload that is far better suited to GPUs, which are effectively great big whopping SIMD arrays.

CPUs are serial processors, and so their "IPC" or "instructions per clock" rating is probably one the best metrics for their performance. CPUs are quitessentially general purpose cores and so they are built to be able to do all kinds of different mathematical operations on a given piece of data per clock. The more instructions they can pull off per clock crudely, the better the performance.

GFLOPS and SIMD perforamance for CPUs is pretty irrelevant for a CPU in a console. As whereas in PCs there are times when a single operation needs to be done on multiple pieces of data, it doesn't make sense to transfer the progam across to the GPU because of the massive latency penalty that would imply, so you have to use the CPU's SIMD units to perform said operation. On a console (and especially an APU-based system) such a situation could be effectively context-switched on the fly (in theory) so that you can make use of your massive SIMD array on your GPU to perform the operation an order of magnitude faster and then switch back the more serial parts of code. So in general CPU SIMD performance doesn't mean much to a console dev, whereas on PC its a necessary evil (however on PC, Intel CPUs have killer SIMD performance now since they mostly have integrated GPU parts on die now anyway).

Edit:
Also yeah Jaguars are the evolution of Bobcat cores. Low power relatively low performance cores. Piledrivers are the successors to Bulldozers, and Steamrollers were to be the evo of Piledrivers (which ironically were initially rumoured to the PS4 CPU cores, but Sony switched to Jags later on for the lower power profile and greater performance/area ratio).

JonathanPower · Jul 15, 2013

badb0y said:
The only way you can conclude that Jaguar is better than Piledriver is if you ignore the facts.

Or you could just look at the performance Watt/ratio.

LukasTaves · Jul 15, 2013

astraycat said:
How interesting. The 68GB/s figure to main memory is the combination of the direct 42GB/s bus to main memory and the coherent 25GB/s bus over the northbridge?

No, that's just one use case example... The 68GB/s can be shared at any ratio between the northbridge and direct access... And on top of that the Gpu has a 30GB/s link to the northbridge (Which I take is only useful for coordinating operations between the gpu and other components), so when the cpu/devices are reading data the gpu also has access to it...

Vesper73 · Jul 15, 2013

Isn't the rather large performance difference between the PS4 GPU and the Xbone GPU, (along with the 7GB available vs 5GB), considerably more important than any bandwidth/latency considerations?

phosphor112 · Jul 15, 2013

badb0y said:
Also I noticed you asked for a better schematic for the Xbox One's memory System:

This shows that using the eSRAM for the CPU is out of the question. You'd add so much latency just to get to it.

ElTorro · Jul 15, 2013

LukasTaves said:
No, that's just one use case example...

Makes sense. I just realized that this schematic gives "0 GB/s" bandwidth for the move engines which would be, of course, nonsensical for a general bandwidth overview.

astraycat · Jul 15, 2013

LukasTaves said:
No, that's just one use case example... The 68GB/s can be shared at any ratio between the northbridge and direct access... And on top of that the Gpu has a 30GB/s link to the northbridge (Which I take is only useful for coordinating operations between the gpu and other components), so when the cpu/devices are reading data the gpu also has access to it...

I'm not sure we're looking at the same picture here.

dr. apocalipsis · Jul 15, 2013

JonathanPower said:
Ok, so basically what you are saying is that we should just ignore the specs, the performance/Watt ratio, the multithreaded perfomance, and then conclude that Jaguar is a weak CPU.

And given the TDP budget of the PS4, there is no CPU that can provide the same perfomance of Jaguar.

Then Bulldozer, both single and multi threaded:

Bulldozer have 6 times the performance of Jaguar in multithreaded enviroments and triple IPC per core.

Any Core i3 can nuke Jaguar in a given TDP. Problem is Intel doesnt license as cheap as AMD. Don't try to argue that Jaguar isn't a weak mobile/notebook CPU. Please, do yourself a favor.

i-Lo · Jul 15, 2013

astraycat said:
I'm not sure we're looking at the same picture here.

Looks like the GPU has access to RAM at maximum (sustained?) speed of 42GB/s and CPU at 26GB/s. Looks like those are hard limits.

dr. apocalipsis · Jul 15, 2013

badb0y said:
Also I noticed you asked for a better schematic for the Xbox One's memory System:

Why is that diagram different from the actual diagram posted by Vgleaks?

http://www.vgleaks.com/durango-memory-system-overview/

Oh, god, you are so funny.

ChorusLindarr · Jul 15, 2013

dr. apocalipsis said:
Piledriver, as bad as it is, is the performance core from AMD, Jaguar is the budget one. They are even crappier than what Xenon/Cell were at 2006 and a huge dissapointment to most of us, hardware enthusiast. You should also stop measuring CPUs using Gflops. Then take into account that as pretty as that figure is, it is theorized multiplying x8. Single thread performance of that CPU cores is abysmal.

For the amount of juice it soaks up, coupled with the tiny amount of die area a Jaguar takes up, I would argue that an IPC that approaches unity is pretty darn solid performance. Sure its not as good as piledriver, nor does it even approach any of the latest intel cores, but for its size and performance per watt the Jaguar isn't too shabby.

The point is, and I have heard this from a very reputable dev before, for console gaming, these games don't actually need massive performance single threaded throughput core. Most graphics and gameplay code is embarrassingly parallelizable, and as such going with a "many weaker core" solution over a "fewer fat core" chip would prove the better choice in 99.9% of usage cases.

Games simply don't need CPUs with massive IPC. Both Cell and Xenon were relatively weak cores for their time, especially in single-threaded performance. Their biggest benefit was their SIMD performance, and that was what made them most valuable. These days however, with GPUs closely coupled to their CPU and with GPUs able to handle compute more efficiently, GPUs are almost always the better choice for performance critical code.

CPUs certainly won't be the bottleneck on any of these systems next-gen. If anything memory bandwidth will (well on one of the platforms anyway).

LukasTaves · Jul 15, 2013

astraycat said:
I'm not sure we're looking at the same picture here.

We are, this picture:

Is a bandwidth distribution example based on typical CPU and GPU usage.

This picture (from vgleaks too):

Shows the buses bandwidths across the entire system. As you can see there, 68GB/s is shared between the north bridge and the gpu memory system. Also there's a dedicated 30GB/s link between the gpu memory system and the north bridge.

astraycat · Jul 15, 2013

dr. apocalipsis said:
Why is that diagram different from the actual diagram posted by Vgleaks?

LukasTaves said:
We are, this picture:

Is a bandwidth distribution example based on typical CPU and GPU usage.

This picture (from vgleaks too):

Shows the buses bandwidths across the entire system. As you can see there, 68GB/s is shared between the north bridge and the gpu memory system. Also there's a dedicated 30GB/s link between the gpu memory system and the north bridge.

Ah, that makes more sense. I was only commenting based on the posted by badb0y.

rothbart · Jul 15, 2013

Brad Grenz said:
I guess the Xbox One might really shine for a 32MB demoscene.

Point, Brad Grenz.

JonathanPower · Jul 15, 2013

ChorusLindarr said:
Actually Mr Power, your opponent is correct on this one.

GFLOPS and SIMD perforamance for CPUs is pretty irrelevant for a CPU in a console. As whereas in PCs there are times when a single operation needs to be done on multiple pieces of data, it doesn't make sense to transfer the progam across to the GPU because of the massive latency penalty that would imply, so you have to use the CPU's SIMD units to perform said operation. On a console (and especially an APU-based system) such a situation could be effectively context-switched on the fly (in theory) so that you can make use of your massive SIMD array on your GPU to perform the operation an order of magnitude faster and then switch back the more serial parts of code. So in general CPU SIMD performance doesn't mean much to a console dev, whereas on PC its a necessary evil (however on PC, Intel CPUs have killer SIMD performance now since they mostly have integrated GPU parts on die now anyway).

Sorry, I completely disagree. You need the CPU to generate the vertices that are processed by the GPU. It's not like the GPU can generate the geometry by itself with no input from the CPU. And guess what? Generating vertices is a floating point intensive operation. Thus, Gflops are extremely importnat for a console CPU.

ChorusLindarr · Jul 15, 2013

JonathanPower said:
Sorry, I completely disagree. You need the CPU to generate the vertices that are processed by the GPU. It's not like the GPU can generate the geometry by itself with no input from the CPU. And guess what? Generating vertices is a floating point intensive operation. Thus, Gflops are extremely importnat for a console CPU.

Not for the last like 5-6 years of GPU tech you do. GPUs have been able to generate their own vetrices for a while now. What the heck do you think tessellation is?

Can Crusher · Jul 15, 2013

One of the reasons it's good Sony invested on those CUs, is that the CPUs in these machines are straight budget level.

Only way you're going to see a big jump in complexity of physics in a game, is through GPGPU.

qa_engineer · Jul 15, 2013

Only 2 pages have elapsed since I last commented? Try harder, Gaf!

Cidd · Jul 15, 2013

qa_engineer said:
Only 2 pages have elapsed since I last commented? Try harder, Gaf!

Well at least it's on topic now that's an improvement.

badb0y · Jul 15, 2013

dr. apocalipsis said:
Why is that diagram different from the actual diagram posted by Vgleaks?

http://www.vgleaks.com/durango-memory-system-overview/

Oh, god, you are so funny.

http://www.vgleaks.com/durango-memory-system-example/
VG leaks posted both of those diagrams. Are you trying to imply that I messed with the picture?

Oh, god, you are so funny.

RoboPlato · Jul 15, 2013

Can Crusher said:
One of the reasons it's good Sony invested on those CUs, is that the CPUs in these machines are straight budget level.

Only way you're going to see a big jump in complexity of physics in a game, is through GPGPU.

Yeah, it was a smart move since you can do a lot of that sort of thing on the GPU instead of the CPU. The CPU is merely sufficient for next gen, which is why there's stuff like the dedicated audio hardware just to get as many tasks off of the CPU as possible, while the GPU and RAM are what are truly next gen. Customizing the GPU to be able to offload those extra compute tasks while not taking much power away from graphics processing is probably the best customization they could have done given their set up.

JonathanPower · Jul 15, 2013

ChorusLindarr said:
Not for the last like 5-6 years of GPU tech you do. GPUs have been able to generate their own vetrices for a while now. What the heck do you think tessellation is?

Nope, you can't generate anything if you don't have a CPU. Tessellation can generate new geometry based on the already existing input sent by the CPU. But if you don't have the CPU generating the original vertices, you have nothing to start with to generate this new geometry.

Support NeoGAF

X1 DDR3 RAM vs PS4 GDDR5 RAM: Both Are Sufficient for Realistic Lighting(Geomerics)

Banned

Banned

Banned

Banned

Banned

Banned

Banned

Banned

Member

Member

Member

Fleer

Member

Member

Member

Member

Member

Banned

Banned

Fleer

Banned

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

Member

Member

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Banned

Banned

Member

Member

Member

Banned

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

Banned

Member

Banned

Banned

Member

Member

Member

Member

Banned

Banned

Member

Member

Member

I'd be in the dick

Member

Similar threads

X1 DDR3 RAM vs PS4 GDDR5 RAM: Both Are Sufficient for Realistic Lighting(Geomerics)