Support NeoGAF

TheRealTalker · Aug 29, 2013

As we promised in our previous article, we present new information about the enhancements in the memory system in PlayStation 4.

http://www.vgleaks.com/more-exclusi...plementation-and-memory-enhancements-details/

Bypass Bits

- If many of these sorts of compute shaders are being run simultaneously, there is “cross talk” in that one compute dispatch may forcé an invalidate or a premature flush of another dispatch’s SC memory

- As a result of this (and other factors), it may be optimal to bypass either the L1, or the L2, or both

Bypassing all caches for the accesses to the shared CPU-GPU memory (effectively making the data UC rather than SC) will remove the need for the invalidates and writebacks of L1 and L2
At the same time, there will be more – perhaps much more – traffic to and from system memory
- It is possible to change the V# and T# definitions on a dispatch by dispatch basis when exploring these issues and tuning the application

- However, in order to allow for a more stable and debugable programming approach

Two override bits have been added to the draw call and dispatch controls
The L1 bypass bit specifies that operations on GC and SC memory bypass the L1 and go directly to L2
The L2 bypass bit specifies that operations on SC memory bypass the L2, using the new “Onion+” bus
This allows the application programmer to use same shader code and V#/T# definitions, and then run the shaders with several different cache flush strategies. No recompilation or reconfiguration is required

Four Memory Buffer Usage Examples

1) Simple Rendering

- Vertex shader and pixel shader only; the pixel shader does not direct memory accesses

- Vertex buffers (RO)

- Textures (RO)

- Color and depth buffers are written using dedicated hardware mechanisms, not memory buffers

2) Raycast

- In order to compute visibility (“can the enemy see the player&#8221 or sound effect volume (“is there a direct path from audio source to player&#8221, sets of 64 rays are compared against large triangle databases

- Triangle databases (RO)

- Input rays (SC)

- Output collisions (SC)

- The raycast probably doesn’t use much SC data and could potentially entirely bypass L2

3) Procedural Geometry (e.g. water surface)

- The CPU maintains a high level state of the water (ripples, splashes coming for interactions with game objects). The GPU generates the detailed water mesh, with is used only for rendering

- Input: water state as maintained by CPU (SC)

- Output: detailed water surface (GC)

4) Chained compute shaders

- Compute shaders write semaphores for the CP to read, enabling other compute dispatches (and perhaps draw calls) to run. They also add packets to compute pipe queues (perhaps packets that kick off more compute dispatches)

- Various buffers (RO, PV, GC, SC)

- Semaphores (UC)

- Compute pipe queue (UC)

- NOTE that CP does not have access to the GPU L2, so semaphores and queue contents must either be assigned the SC memory type (visible to the CP only after a L2 writeback) or the UC memory type (which bypases the L2)

- Using UC can allow for greater flexibility, e.g. a compute dispatch can have several stages that send and receive semaphores. Using SC requires the dispatch to terminate before the semaphore is visible externally

Strategies for Scalar Loads

- In addition to the “gather read” and “scatter write” loads into VGPRs (Vector GPRs), the R10xx core also supports scalar reads and writes into SGPRs (Scalar GPRs)

Typically, scalar reads are used to load T#, V#, and S# structures, as well as any other data that applies to the wavefront as a whole (as opposed to the vector reads that load data on a thread-by-thread basis)
- These read operations use the L2, but instead of the L1 they use a different cache called the “K-cache”. There is one 16 KB K-cache for each three CU’s

The K-cache must be invalidated when there is the possibility that it may contain “stale” data, e.g. a later draw call or dispatch uses the same location in the T# (etc) ring buffer as an earlier call
K-cache invalidation takes 1 cycle but dumps all data, resulting in a high cost
The most straightforward way of reducing the invalidation count is to use larger ring buffers for the scalar input data to the draw calls and dispatches

Performance

- Performance of the L2 cache operations is much better on Liverpool than on R10xx

- The L2 invalidate typically takes 300-350 cycles

All in-flight memory transactions must settle before the invalidate can be completed
A small overhead (about 75 cycles) is required to locate and invalidate the lines
This results in the direct cost listed above. There is also an indirect cost, in that invalidated SC data must potentially be reloaded
- The cost of an L2 writeback depends on the amount of data that must be written back to system memory

The Onion bus can support 10GB/sec, which means 12.5 bytes/cycle (0.2 lines/cycle)
If we attribute 160 GB/sec of the Garlic bus to the GPU, the bus can support 200 bytes/cycle (3.125 lines/cycle)
- If there is only a little SC dirty data present in the L2, the writeback is fairly fast

4K bytes worth of dirty Onion SC lines will take perhaps 500 cycles (Onion bottleneck PLUS small overhead to locate lines PLUS latency to system memory)
20K bytes worth of dirty Garlic SC lines will take about the same time
- Worst case L2 writeback cost is basically the Onion or Garlic cost of writing 512 KB (about 40,000 cycles and 3,000 cycles respectively)

Additional Optimizations

- There are additional further optimizations in the L1 and L2 caches

- The L2 cache has dirty state tracking

If the L2 has performed no reads from SC memory since the last invalidate, it will ignore any requests to invalidate
If the L2 has performed no writes to SC memory since the last writeback, it will ignore any requests to perform a writeback
This will help performance in the situation where multiple pipes are requesting invalidates and writebacks, e.g. several compute pipes are separately dispatching compute shaders that use SC memory
- The L1 cache can be invalidated “once per CU”

A dispatch may send multiple wavefronts to a single CU
Using this option, the invalidate of GC/SC occurs only on the first wavefront of the dispatch

Edit: Added Custom Direct X 11 support: Can utilize the Direct X 11 feature set

Developers will be able to take advantage of Microsoft’s latest industry standard DirectX API — DirectX 11.1, but Sony has taken the time to improve upon it, pushing the feature set beyond what is available for PC games development.

Those improvements include better shader pipeline access, improved debugging support features out the box, and much lower level access to the system hardware enabling developers to do “more cool things.” That’s achieved not only through an modified DirectX 11.1 API, but also a secondary low-level API specifically for the PS4 hardware.

http://www.geek.com/games/sony-iimprove-directx-11-for-the-ps4-blu-ray-1544364/

onQ123 said:
I also noticed that Xbox One GPU is DirectX 11.1+ & PS4 GPU is DirectX 11.2+

----------------------------------------------------------------------------------------------------------

Basically an update on the implementation of hUMA from the prior thread...
http://www.neogaf.com/forum/showthread.php?t=662537&highlight=vgleaks

-----------------------------------------------------------------------------------------------------------
In multiple threads some of us where debating if it was thread worthy or not... I took the precautionary measures in making one still

kensama said:
Don't know if already posted but new information from hUMA PS4

http://www.vgleaks.com/more-exclusive-playstation-4-huma-implementation-and-memory-enhancements-details/

TheRealTalker said:
phosphor112 said:

New news, new thread. That's the rules.

Click to expand...

lol the amount of hUMA threads it will make if we take this analogy would be ridiculous

Ha! showed you!... wait a minute

KyleOnTheRun · Aug 29, 2013

But what does it mean?

DodgerSan · Aug 29, 2013

Could anyone boil down the real world implications of this, especially as it pertains to multi-plats?

GhostWriter24 · Aug 29, 2013

Just show me the games.

NoLootBoxDev · Aug 29, 2013

I know some of those words........

Mikey Jr. · Aug 29, 2013

I understood everything.

RoboPlato · Aug 29, 2013

Article is a bit too technical for my understanding but it seems like this will provide PS4 with some significant efficiency gains and will streamline programming for certain processes by a significant margin. It seems like this could be taken advantage of by more devs than I thought if some of the stuff is as simple to utilize as the article makes it sound.

Nafai1123 · Aug 29, 2013

PS4 be trackin dirty

Black Republican · Aug 29, 2013

id rather take Cboat translation lessons than this

qa_engineer · Aug 29, 2013

So how many rams does the ps4 have?

TheRealTalker · Aug 29, 2013

Black Republican said:
id rather take Cboat translation lessons than this

lol so true... its funny since I'm starting a degree in this... Oh boy I'm f'ed

Ricky_R · Aug 29, 2013

It's like I woke up without a brain.

UraMallas · Aug 29, 2013

Mikey Jr. said:
I understood everything.

Therefore, you understand nothing.

Downslide · Aug 29, 2013

good stuff op

Oppo · Aug 29, 2013

So what you are saying is... this is a form of... one might term it...

Blast processing?

I do love me some dirty onion & garlic. that's Cajun style

SwiftDeath · Aug 29, 2013

But I like it

ShaneDude · Aug 29, 2013

So uhh.... is this unique to the ps4?

Canis lupus · Aug 29, 2013

It sounds really advanced and good.

Bad_Boy · Aug 29, 2013

darkside31337 · Aug 29, 2013

Wheres Jeff when you need him?

Brera · Aug 29, 2013

So basically...what you mean is...secret sauce?

qa_engineer · Aug 29, 2013

ShaneDude said:
So uhh.... is this unique to the ps4?

Yes. No other apu has this feature set. Fact

Edi: What?

famousmortimer · Aug 29, 2013

ShaneDude said:
So uhh.... is this unique to the ps4?

No. hUMA is a tech that AMD and others will be pushing over the coming years. It's believed that the Xbox One has something similar.

It's good news though. The less time the system needs to spend swapping memory the more time it can spend on other things. It's going to help games become more graphically rich and more complex as they become more and more efficient with the system. This happens with all hardware, but hUMA should multiply that effect. If this is as powerful and simple as it sounds... the difference between launch games and end of gen games will be the largest in console history. We'll look at Killzone and Forza and laugh that we thought they looked good.

Drencrom · Aug 29, 2013

darkside31337 said:
Wheres Jeff when you need him?

I don't think Jeff_Rigby is the guy you want when you are in need of a simple explanation

Goron2000 · Aug 29, 2013

RoboPlato · Aug 29, 2013

famousmortimer said:
No. hUMA is a tech that AMD and others will be pushing over the coming years. It's believed that the Xbox One has something similar.

It's good news though. The less time the system needs to spend swapping memory the more time it can spend on other things. It's going to help games become more graphically rich and more complex as they become more and more efficient with the system. This happens with all hardware, but hUMA should multiply that effect. If this is as powerful and simple as it sounds... the difference between launch games and end of gen games will be the largest in console history. We'll look at Killzone and Forza and laugh that we thought they looked good.

I think I just shed a tear.

USC-fan · Aug 29, 2013

This is the onion PLUS bus that only in the PS4. This is not something AMD has rolled into their product at this time.

Very cool and in depth info. Sony really made some very deep changes to this product. It going to pay off years from now when they really start "coding to the metal." haha

Salex · Aug 29, 2013

Spongebob · Aug 29, 2013

Edit: nvm, sorry.

TheRealTalker · Aug 29, 2013

famousmortimer said:
No. hUMA is a tech that AMD and others will be pushing over the coming years. It's believed that the Xbox One has something similar.

It's good news though. The less time the system needs to spend swapping memory the more time it can spend on other things. It's going to help games become more graphically rich and more complex as they become more and more efficient with the system. This happens with all hardware, but hUMA should multiply that effect. If this is as powerful and simple as it sounds... the difference between launch games and end of gen games will be the largest in console history. We'll look at Killzone and Forza and laugh that we thought they looked good.

Dang... by the way in any sort of way or form... will this help out the CPU overall power as in give it a extra boost

since both the ps4 and xbox one have similar CPUs but not GPUs

RoboPlato · Aug 29, 2013

USC-fan said:
This is the onion PLUS bus that only in the PS4. This is not something AMD has rolled into their product at this time.

Very cool and in depth info. Sony really made some very deep changes to this product. It going to pay off years from now when they really start "coding to the metal." haha

I like the approach that they took to PS4. They knew it wouldn't be a beast in raw power since it needed to be affordable, quiet, and cool so they looked down the line and made a lot of improvements in efficiency for techniques that will be getting used in a few years and made them easy to access and utilize.

Spongebob said:
Old news.

http://www.neogaf.com/forum/showthread.php?t=662537

This is a different article. It has examples.

KAL2006 · Aug 29, 2013

So with PS4 having HuMa, GDDR5, more powerful GPU, and some other stuff I forgot.

How wide is the gap between Xbox One and PS4 in none technical terms.

SwiftDeath · Aug 29, 2013

famousmortimer said:
If this is as powerful and simple as it sounds... the difference between launch games and end of gen games will be the largest in console history. We'll look at Killzone and Forza and laugh that we thought they looked good.

That sounds promising

I can't wait

Dr. Kaos · Aug 29, 2013

My analysis is that the PS4 is clearly turbocharged, and the turbo has cycles, just like a washing machine.

Spongebob · Aug 29, 2013

RoboPlato said:
I like the approach that they took to PS4. They knew it wouldn't be a beast in raw power since it needed to be affordable, quiet, and cool so they looked down the line and made a lot of improvements in efficiency for techniques that will be getting used in a few years and made them easy to access and utilize.

This is a different article. It has examples.

Thanks.

Edited my last post.

Americanmushroom · Aug 29, 2013

KAL2006 said:
So with PS4 having HuMa, GDDR5, more powerful GPU, and some other stuff I forgot.

How wide is the gap between Xbox One and PS4 in none technical terms.

PS4 is more powwrfull but you'll only see a difference in first party games probably

PolarGamer · Aug 29, 2013

I miss the informative DBZ scale.

EMT0 · Aug 29, 2013

famousmortimer said:
No. hUMA is a tech that AMD and others will be pushing over the coming years. It's believed that the Xbox One has something similar.

It's good news though. The less time the system needs to spend swapping memory the more time it can spend on other things. It's going to help games become more graphically rich and more complex as they become more and more efficient with the system. This happens with all hardware, but hUMA should multiply that effect. If this is as powerful and simple as it sounds... the difference between launch games and end of gen games will be the largest in console history. We'll look at Killzone and Forza and laugh that we thought they looked good.

Is multiply really the right word? I'm asking because I honestly don't know if that was just a slip of the tongue, or if you're being literal. In the latter case...weak hardware redeemed.

BenouKat · Aug 29, 2013

famousmortimer said:
the difference between launch games and end of gen games will be the largest in console history. We'll look at Killzone and Forza and laugh that we thought they looked good.

Excellent !

Pretty funny to read when you saw people complain on every forums of internet that this gen will reach their end graphics capability very quickly.

TheRealTalker · Aug 29, 2013

KAL2006 said:
So with PS4 having HuMa, GDDR5, more powerful GPU, and some other stuff I forgot.

How wide is the gap between Xbox One and PS4 in none technical terms.

there is also a direct X difference as the PS4 has a custom version of it that will help in things like shader pipeline access, etc...
http://www.geek.com/games/sony-iimprove-directx-11-for-the-ps4-blu-ray-1544364/

Prelude. · Aug 29, 2013

Americanmushroom said:
PS4 is more powwrfull but you'll only see a difference in first party games probably

Just like between the PS2 and the GC/Xbox multip- oh wait...

lherre · Aug 29, 2013

TheRealTalker said:
there is also a direct X difference as the PS4 has a custom version of it that will help in things like shader pipeline access, etc...
http://www.geek.com/games/sony-iimprove-directx-11-for-the-ps4-blu-ray-1544364/

Ps4 doesn't use DX ...

RoboPlato · Aug 29, 2013

lherre said:
Ps4 doesn't use DX ...

People are getting confused. It has a similar featureset to DX 11.2 (if I remember correctly) but is a different API.

TheRealTalker · Aug 29, 2013

lherre said:
Ps4 doesn't use DX ...

I said it was a custom version of it...

88random · Aug 29, 2013

Good stuff

Doc Evils · Aug 29, 2013

Every PS4 has a Cerny inside.

Oppo · Aug 29, 2013

KAL2006 said:
So with PS4 having HuMa, GDDR5, more powerful GPU, and some other stuff I forgot.

How wide is the gap between Xbox One and PS4 in none technical terms.

I've been reading a lot of this stuff, and I am far from an expert, but I will give this a shot. This is my understanding, corrections are of course welcome.

So these two systems are very very similar. More comparable than any other two consoles in the same gen in history, I'd wager.

The Sony setup is very straightforward. It has high bandwidth access to a bunch of fast RAM. Most of the philosophy behind Ps4 is their own 180 from the Ps3. Nothing is particularly exotic or fancy. It's a workhorse and well provisioned. It sort of falls within the general philosophy of hUMA whichis to eliminate CPU/GPU bottlenecks. PS4 games will get up to speed quickly. New tricks will be discovered as time goes on because of the new communication abilities to be had between CPU and GPU.

The Xbone approach more closely resembles something like what Sony would have maybe done before. In order to use the much cheaper RAM they opted for, they are using some hUMA type techniques, notably a tiny chunk of fast eDRAM. This helps leverage the slower RAM into a much higher performance. But it's not as simple, it needs to be accounted for. Therefore that will have an impact.

Now much like Cell, hard software problems do get solved eventually, and performance increases, so we can also expect the Xbone performance to climb over time of course. But it's not quite as straightforward as the PS4 solution, and to my untrained eye it seems that you can see a bit of the compromise that Ms has opted to make, in the service of longevity/price/component suppliers/whatever.

But I still think the takeaway is that these boxes are more similar than not, and while the PS4 does indeed hold an edge, it still remains to be seen how that actually plays out in the wider market. All we know for sure is that Sony's first party will probably blow the doors off like they usually do but who knows what beyond that.

Xenex · Aug 29, 2013

Shearie · Aug 29, 2013

Doc Evils said:
Every PS4 has a Cerny inside.

Does this trump the infinite power of the cloud?

qa_engineer · Aug 29, 2013

RoboPlato said:
People are getting confused. It has a similar featureseo DX 11.2 (if I remember correctly) but is a different API.

PlayStation shader language. Its features exceed those of directx

Support NeoGAF

PlayStation 4 hUMA implementation and memory enhancements details - Vgleaks

Banned

Member

Member

Member

Member

Member

I'd be in the dick

Banned

Member

Member

Banned

Member

Member

Banned

Member

Member

Member

Member

time to take my meds

Tomodachi wa Mahou

Banned

Member

Banned

Member

best junior ever

I'd be in the dick

Banned

Banned

Banned

Banned

I'd be in the dick

Banned

Member

Banned

Banned

Banned

Member

Banned

Banned

Banned

Member

Accurate

I'd be in the dick

Banned

Member

Member

Member

Member

Member

Member

Similar threads