Support NeoGAF

Gestault · Jul 19, 2013

#TeamOnionBus

Oppo · Jul 19, 2013

PS4 games will give you bad breath, and be delicious. Confirmed.

anexanhume · Jul 19, 2013

Raist said:
Nah there was an actual downgrade.

Price

ypo · Jul 19, 2013

xaosslug said:
Xbone's low spec holding PS4 back confirmed?

That's what he's saying. I doubt he's talking about the PC.

madmaxx350 · Jul 19, 2013

xaosslug said:
Xbone's low spec holding PS4 back confirmed?

more like they are more different than people think they are.

alexandros · Jul 19, 2013

benny_a said:
So overall, I'd argue yes to your question.

The thing that puzzles me is the developers' mention of the need to allocate data correctly. When I first read about AMD's HSA, I assumed that everything would be unified and could be customized to the needs of the developers as required. I thought there wouldn't be any need at all for data separation, as to me and my limited knowledge it seemed that CPU and GPU data would be fed thrugh a common bus and allocated dynamically as needed. Now I'm confused

kaching · Jul 19, 2013

Nerfgun said:
PS4 games will give you bad breath, and be delicious. Confirmed.

When playing your PS4, always remember to set it downwind from where you'll be sitting.

Wishmaster92 · Jul 19, 2013

onQ123 said:
There is also a bus that's shared between the 2

Yeah that sounds familiar. Didn't gamasutra mention this?

Freki · Jul 19, 2013

echothreealpha said:
Can you give me examples where someone has done this regarding PS4's ram please?

I'll give you something even better:

specialguy said:
So if the ESRAM BW is now 192 GB/s, at times XB1 has 260 GB/s BW feeding 1.2 TF.

When you compare that to 176 GB/s feeding 1.8 TF, the XB1 can have more than double the bandwidth per flop as PS4...

XB1=260 GB/s/1.2 teraflops= 216 GB/s/TF

PS4=176/1.8=98 GB/s/TF

SwiftDeath · Jul 19, 2013

alexandros said:
The thing that puzzles me is the developers' mention of the need to allocate data correctly. When I first read about AMD's HSA, I assumed that everything would be unified and could be customized to the needs of the developers as required. I thought there wouldn't be any need at all for data separation, as to me and my limited knowledge it seemed that CPU and GPU data would be fed thrugh a common bus and allocated dynamically as needed. Now I'm confused

I"m certainly not overly knowledgeable on this but wouldn't there be a learning curve for devs using unified memory?

And hence need to learn how to correctly allocate data for the new unified memory system that they might not be familiar with?

Or am reading into it incorrectly?

Ostinatto · Jul 19, 2013

i'm so ignorant about tech stuff, i need DBZ charts pls.

benny_a · Jul 19, 2013

alexandros said:
The thing that puzzles me is the developers' mention of the need to allocate data correctly. When I first read about AMD's HSA, I assumed that everything would be unified and could be customized to the needs of the developers as required. I thought there wouldn't be any need at all for data separation, as to me and my limited knowledge it seemed that CPU and GPU data would be fed thrugh a common bus and allocated dynamically as needed. Now I'm confused

But there isn't any need for data separation. I think it's just the author trying to put in more facts into the article but the closeness of the fact that PS3 is split-memory and PS4 is using more than one bus to access the same amount of RAM gives it the sense of equivalence where there isn't one.

Ostinatto said:
i'm so ignorant about tech stuff, i need DBZ charts pls.

Technical wise nothing has changed so the old DBZ charts are still valid. What is surprising is how quickly they got something running. I don't follow DBZ but I guess if Goku had a son he would be birthed with ultra-speed.

GameSeeker · Jul 19, 2013

xaosslug said:
Xbone's low spec holding PS4 back confirmed?

Incorrect. Please read the complete article.

---
The full quote from Eurogamer is :
"The PS4's GPU is very programmable. There's a lot of power in there that we're just not using yet. So what we want to do are some PS4-specific things for our rendering but within reason - it's a cross-platform game so we can't do too much that's PS4-specific," he reveals.

"There are two things we want to look into: asynchronous compute where we can actually run compute jobs in parallel... We [also] have low-level access to the fragment-processing hardware which allows us to do some quite interesting things with anti-aliasing and a few other effects."
----

Asynchronous compute is one of the areas the PS4 has a very significant advantage over Xbone. PS4 has extra CU units (18 vs. 12 for Xbone) that can be applied to asynchronous compute. PS4 also has custom modified compute queues: 64 vs. the standard 2 on AMD GCN parts.

It's great that PS4 ports are already looking at taking advantage of asynchronous compute this early in the lifecycle.

DemonCleaner · Jul 19, 2013

thread of aborted stealth trolling

madmaxx350 · Jul 19, 2013

GameSeeker said:
Incorrect. Please read the complete article.

---
The full quote from Eurogamer is :
"The PS4's GPU is very programmable. There's a lot of power in there that we're just not using yet. So what we want to do are some PS4-specific things for our rendering but within reason - it's a cross-platform game so we can't do too much that's PS4-specific," he reveals.

"There are two things we want to look into: asynchronous compute where we can actually run compute jobs in parallel... We [also] have low-level access to the fragment-processing hardware which allows us to do some quite interesting things with anti-aliasing and a few other effects."
----

Asynchronous compute is one of the areas the PS4 has a very significant advantage over Xbone. PS4 has extra CU units (18 vs. 12 for Xbone) that can be applied to asynchronous compute. PS4 also has custom modified compute queues: 64 vs. the standard 2 on AMD GCN parts.

It's great that PS4 ports are already looking at taking advantage of asynchronous compute this early in the lifecycle.

Though I don't agree with the person you corrected but the part you highlighted read exactly as what he was trying to say...

ElTorro · Jul 19, 2013

alexandros said:
The thing that puzzles me is the developers' mention of the need to allocate data correctly. When I first read about AMD's HSA, I assumed that everything would be unified and could be customized to the needs of the developers as required.

It is unified as it has a unified address space. What they are talking about is that you have to decide if your memory access commands are checked against the L1/L2 of the CPU caches (Onion) or not (Garlic). L1/L2 are essential for any CPU's management of latency, and latency-tolerant memory access from the GPU would corrupt them without needing them. Hence, the two pathways. As a developer you have to decide, if CPU or GPU access performance is most crucial for a given data set. In the former case, you would issue memory access commands via Onion, in the former via Garlic. If the GPU wants to read/write data relevant to the CPU for asynchronous compute, it can use Onion+ to use the CPUs L1/L2 caches and to, analogously, not corrupt its own internal caches.

It is not a contradiction to HSA but, on the contrary, a requirement.

SwiftDeath · Jul 19, 2013

ElTorro said:
It is unified as it has a unified address space. What they are talking about is that you have to decide if your memory access commands are checked against the L1/L2 of the CPU caches (Onion) or not (Garlic). L1/L2 are essential for any CPU's management of latency, and latency-tolerant memory access from the GPU would corrupt them without needing them. Hence, the two pathways. As a developer you have to decide, if CPU or GPU access performance is most crucial for a given data set.

It is not a contradiction to HSA but, on the contrary, a requirement.

So this would in fact give devs more options?

If they wanted to be "lazy" they could and just use the Onion approach or go the garlic route and potentially gain more power/speed?

I apologize if my post asks old questions but I wasn't here in March

ElTorro · Jul 19, 2013

SwiftDeath said:
So this would in fact give devs more options?

Yes. Crucial ones.

Captain Tuttle · Jul 19, 2013

This is good right?

Nafai1123 · Jul 19, 2013

Example of how Onion/Garlic bus's can be used.

"One's called the Onion, one's called the Garlic bus. Onion is mapped through the CPU caches... This allows the CPU to have good access to memory," explains Jenner.

"Garlic bypasses the CPU caches and has very high bandwidth suitable for graphics programming, which goes straight to the GPU. It's important to think about how you're allocating your memory based on what you're going to put in there."

"One issue we had was that we had some of our shaders allocated in Garlic but the constant writing code actually had to read something from the shaders to understand what it was meant to be writing - and because that was in Garlic memory, that was a very slow read because it's not going through the CPU caches. That was one issue we had to sort out early on, making sure that everything is split into the correct memory regions otherwise that can really slow you down."

So elements like main system heap (containing the main store of game variables), key shader data, and render targets that need to be read by the CPU are allocated to Onion memory, while more GPU-focused elements like vertex and texture data, shader code and the majority of the render targets are kept in the ultra-wide Garlic memory.

Really interesting read. It's great news that 3rd parties are already coming this far with the tools and lowest level GNM API this early. Sounds like the GNMX wrapper API still needs some work but is improving.

"Most people start with the GNMX API which wraps around GNM and manages the more esoteric GPU details in a way that's a lot more familiar if you're used to platforms like D3D11. We started with the high-level one but eventually we moved to the low-level API because it suits our uses a little better," says O'Connor, explaining that while GNMX is a lot simpler to work with, it removes much of the custom access to the PS4 GPU, and also incurs a significant CPU hit.

"The other thing we did is to look at constant setting. GNMX - which is Sony's graphics engine - has a component called the Constant Update Engine which handles setting all the constants that need to go to the GPU. That was slower than we would have liked. It was taking up a lot of CPU time. Now Sony has actually improved this, so in later releases of the SDK there is a faster version of the CUE, but we decided we'd handle this ourselves because we have a lot of knowledge about how our engine accesses data and when things need to be updated than the more general-purpose implementation... So we can actually make this faster than the version we had at the time."

stryke · Jul 19, 2013

These kinds of tech presentations are always interesting, really appreciate Reflections for doing this (and Sony for allowing them to). Hopefully we get more of these leading up to launch.

Ubisoft seem very chummy with Sony lately

ElTorro · Jul 19, 2013

Captain Tuttle said:
This is good right?

Yes.

Without two pathways either (a) all memory access would go through the L1/L2 caches and the GPU would corrupt them by causing caching of data irrelevant for the CPU which is highly dependent on L1/L2 or (b) all memory access would go directly to memory causing the CPU to die the death of latency since CPUs cannot hide latency as GPUs can.

vpance · Jul 19, 2013

So PS4 specific optimizations = better AA, more fx. And at launch too. Bodes well for the future of multi plats.

Portugeezer · Jul 19, 2013

kvn said:
Oh god please close this thread.

This thread will deliver!

alexandros · Jul 19, 2013

ElTorro said:
It is not a contradiction to HSA but, on the contrary, a requirement.

So if the unified memory doesn't negate the need to separate data in the way you described, what is its main benefit?

bobbytkc · Jul 19, 2013

alexandros said:
So if the unified memory doesn't negate the need to separate data in the way you described, what is its main benefit?

The CPU and GPU communicates better and faster.

ElTorro · Jul 19, 2013

alexandros said:
So if the unified memory doesn't negate the need to separate data in the way you described, what is its main benefit?

They are not talking about separation of memory but how to allocate/access it. All memory is visible to both CPU and GPU. They are only talking about whether to access certain regions of memory via the one or the other pathway depending on whether CPU- or GPU-performance is most relevant to a particular data set.

AyaisMUsikWhore · Jul 19, 2013

xaosslug said:
Xbone's low spec holding PS4 back confirmed?

Exactly... That what he subliminally said without thinking about it. Obviously they don't want the PS4 version to be better due to unfair advantage.. So they are going to milk what can be done on both consoles

Alebrije · Jul 19, 2013

iceatcs said:
Twilighter will hate it.

I mean onion and garlic - at least we don't need them.

The onion and garlic working together

Xenon · Jul 19, 2013

Something told me we were not going to go into next gen without hearing about the unlocked power these machines have. So all we need now is a hard number of the percentage of the machines potential the first gen games are going to be using, 50,70,90 or 110%.

Can Crusher · Jul 19, 2013

It was almost bad news.

Corto · Jul 19, 2013

PS4 stinks!

great code names for the memory buses

Canis lupus · Jul 19, 2013

Ostinatto said:
i'm so ignorant about tech stuff, i need DBZ charts pls.

GPU: Garlic
CPU: Garlic jr

alexandros · Jul 19, 2013

ElTorro said:
It is unified as it has a unified address space. What they are talking about is that you have to decide if your memory access commands are checked against the L1/L2 of the CPU caches (Onion) or not (Garlic). L1/L2 are essential for any CPU's management of latency, and latency-tolerant memory access from the GPU would corrupt them without needing them. Hence, the two pathways. As a developer you have to decide, if CPU or GPU access performance is most crucial for a given data set. In the former case, you would issue memory access commands via Onion, in the former via Garlic. If the GPU wants to read/write data relevant to the CPU for asynchronous compute, it can use Onion+ to use the CPUs L1/L2 caches and to, analogously, not corrupt its own internal caches.

It is not a contradiction to HSA but, on the contrary, a requirement.

ElTorro said:
Yes.

Without two pathways either (a) all memory access would go through the L1/L2 caches and the GPU would corrupt them by causing caching of data irrelevant for the CPU which is highly dependent on L1/L2 or (b) all memory access would go directly to memory causing the CPU to die the death of latency since CPUs cannot hide latency as GPUs can.

Thanks so much for the detailed explanation, I appreciate it!

Mr_Antimatter · Jul 19, 2013

I see someone forgot to feed their hardware engineers before they started work on those parts.

WolvenOne · Jul 19, 2013

I have limited tech saviness at my disposal, but I'm reading this as a shortcut to reduce latencies and bottlenecks. IE: When one Bus is occupied, things don't suddenly come screeching to a halt, in terms of performance.

It's little things like this that help consoles punch above their weight class, in relation to PC's. Vanilla off the shelf PC's, tend to have slightly less optimized architectures.

Also, I'm REALLY pleased with how easy PS4 development appears to be so far. I really hope it leads to a lot of new smaller titles later on. I'm really hungry for new franchises, and creative gameplay concepts.

Shikoro · Jul 19, 2013

Canis lupus said:
GPU: Garlic
CPU: Garlic jr

Fuc*ing lol!

But yeah, Cerny really thought about pretty much everything. We will all reap the benefits of their hard work.

SwiftDeath · Jul 19, 2013

Mr_Antimatter said:
I see someone forgot to feed their hardware engineers before they started work on those parts.

Do you mean the names for the buses?

benny_a · Jul 19, 2013

Shikoro said:
Fuc*ing lol!

But yeah, Cerny really thought about pretty much everything. We will all reap the benefits of their hard work.

We all love Cerny (except the guy with the anti-Cerny avatar) but let's not discount the multitude of hardware engineers in Japan that probably proposed a lot of these things and then implemented them.

He is calling the shots but he is the representation of the hardware team.

SwiftDeath · Jul 19, 2013

benny_a said:
We all love Cerny (except the guy with the anti-Cerny avatar) but let's not discount the multitude of hardware engineers in Japan that probably proposed a lot of these things and then implemented them.

He is calling the shots but he is the representation of the hardware team.

I love what Cerny represents

ElTorro · Jul 19, 2013

WolvenOne said:
I have limited tech saviness at my disposal, but I'm reading this as a shortcut to reduce latencies and bottlenecks. IE: When one Bus is occupied, things don't suddenly come screeching to a halt, in terms of performance.

It is really just about whether an individual address in main memory should be mapped to CPU-L1/L2 cache (Onion) or not (Garlic). CPU-L1/L2 is (a) of limited size and (b) highly relevant to the CPU but at the same time irrelevant to the GPU. Hence, you issue access commands to CPU-relevant data through Onion, and access to CPU-irrelevant data through Garlic. As a result the GPU does not bully the CPU.

Panajev2001a · Jul 19, 2013

GameSeeker said:
This is the impressive part:

And this:

This bodes really well for PS4 ports to be of very quality and the best version on consoles.

This is really good. months for 2-3 people for such task at thsi stage of the console's lifecycle (months before launch) is quite impressive.

rvy · Jul 19, 2013

Canis lupus said:
GPU: Garlic
CPU: Garlic jr

Good job, I'm laughing at work.

Orayn · Jul 19, 2013

Cool stuff! Seems like this lets them sidestep the one potential drawback of the GDDR5, and choosing which bus to use sounds like a pretty trivial form of optimization compared to the PS3's split memory and Cell SPE shenanigans.

Can Crusher said:
It was almost bad news.

It can still be bad news if you truly want it to be. You just need to BELIEVE! The Onion bus has a smaller number, you see, therefore fewer GokuFLOPS and bits.

DieH@rd · Jul 19, 2013

Really nice article, tons of details on PS4 arhitecture and SDK tools. Sony seems to be in very good position with PS4 deployment.

Article also confirmed two things - 2 CPU cores are dedicated to PS4 OS, and PSN download speed limit is [was] 12mbps.

Shikoro · Jul 19, 2013

benny_a said:
We all love Cerny (except the guy with the anti-Cerny avatar) but let's not discount the multitude of hardware engineers in Japan that probably proposed a lot of these things and then implemented them.

He is calling the shots but he is the representation of the hardware team.

Of course, they all did a wonderful job and I, just like a lot of people here, can't wait to see it in action for myself.

jhmtehgamr20xx · Jul 19, 2013

CassidyIzABeast said:
That explains the lack of 1080p/60fps titles...

This explains why you're banned...

Beerman462 · Jul 19, 2013

This doesn't seem that complicated to me. Developers can program like more standard systems and pass information from CPU to GPU through main memory if they want or they can pass it directly from the CPU to GPU using the onion bus, bypassing the main memory and the caches.

phosphor112 · Jul 19, 2013

I went ahead and made a "clearer" version of the diagram posted on the first page.

Hope it helps.

Fixed dem flops =D

mintylurb · Jul 19, 2013

Freki said:
I'll give you something even better:

Aww..I miss specialguy.

Support NeoGAF

PS4's memory subsystem has separate buses for CPU (20Gb/s) and GPU(176Gb/s)

Member

Member

Member

Member

Member

Banned

"GAF's biggest wanker"

Member

Member

Member

Member

extra source of jiggaflops

Member

Member

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

Banned

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

Member

Banned

ADD New Gen Gamer

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

Member

Member

Banned

Member

Member

Banned

Member

Member

Member

Member

extra source of jiggaflops

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

GAF's Pleasant Genius

Banned

Member

Banned

Member

Banned

Member

Banned

Member

Similar threads