• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox One hardware breakdown by ExtremeTech.com after HotChips reveal

Chobel

Member
Is 204GB/s peak bandwidth ever gonna be explained? And how the hell is "109GB/s minimum" even possible? I thought that 109GB/s is peak bandwidth in case of write or read only.
 

Klocker

Member
With regards to the above statement - I understand its benefit over PC tech, but how does it compare to PS4?

I'd assumed he meant PS4, especially given the quote was re-tweeted by Yoshida.
What we’re seeing with the consoles are actually that they are a little bit more powerful than we thought for a really long time – ESPECIALLY ONE OF THEM, but I’m not going to tell you which one," Nilsson told VideoGamer.com at Gamescom earlier today.

“And that makes me really happy. But in reality, I think we’re going to have both those consoles pretty much on parity – maybe one sticking up a little bit.

I think Yoshida jumped on the quote but the implication in my comprehension was one of them is "surprisingly more powerful than we thought" (xbone was the one thought to not be as powerful) ergo, the Xbox one is surprising with its power meaning they will be "consoles pretty much on parity ".... one sticking up a little was the difference on PS4 having a slight edge over xbone but not as much as raw specs have led people to believe
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Good. If XB1 is capable of the same type of CPU-GPU algorithms as PS4 then everybody wins big.

It was always capable of GPGPU. The article actually doesn't present or discuss any information that we hadn't months ago. The HotChips talk actually "just" confirmed pretty much everything that we knew from the leaked Durango documents (after taking the change in GPU clock from 800mhz to 853mhz into account) which is nice.

The question is not if it is capable of GPGPU but if its memory subsystem supports certain use cases as well as the PS4's more capable setup. It's a bit weird to read that the article stresses the bandwidth figures between the memory pools and clients in particular, since those have been known for quite some time now. The very fact that they are all "inferior" compared to the PS4's setup actually was taken as another disadvantage for the XB1. So I am a bit puzzled why the article interprets those figures as a surprising and good thing.

Furthermore, the article is just plain wrong in some places. For instance, the fact that the ESRAM is physically implemented as four blocks of 8MB doesn't matter at all. Neither does it affect performance nor does it prevent a unified address space for the ESRAM. And, as others have stated, the ESRAM is not a cache. It's first and foremost there to hold pixelbuffers. (It doesn't have to be used in that way, but it wouldn't make any sense to not use it for that.) Apart from that, it can be used for other use cases. In combination with the DMEs, textures can be prefetched and stored there. Or it can be use as a generic scratchpad for GPGPU.

However, all of this is no secret sauce. It is still "just" memory. It is there to provide additional bandwidth that the DDR3 in itself just can't provide. Its downside in comparison to the PS4 is that this pool is just 32MB big whereas the PS4 has more bandwidth on the entire memory pool. 32MB are enough to store the most important pixelbuffers. Nevertheless, you can still saturate the pool easily if you are using deferred (two-pass) rendering which uses multiple "auxiliary" buffers for storing information gathered in the first pass. KZ:SF, for instance, uses 39MB at 1080p for pixelbuffers alone.

As already stated in many other threads, the actual differences in GPGPU are not "radical" in general, but there are some: for instance, the PS4 supports selective control of its GPU cache lines to distinguish between cached data that is shared with the CPU or not. The XB1 seems to have to flush the entire GPU cache when it wants to synchronize. In addition, the PS4 has more scheduling and buffering logic for GPU instructions to saturate its ALUs more efficiently. Independent of what "hUMA" is supposed to be comprised of, when we discuss the memory subsystems and how they support GPGPU, there are some differences.

Again, this does not mean that the XB1 cannot do GPGPU. I am not even interested in "console war" comparisons, just in understanding the tech better. And the conclusions drawn by the article are either old news or are incorrect.
 

ekim

Member
I think Yoshida jumped on the quote but the implication in my comprehension was one of them is "surprisingly more powerful than we thought" (xbone was the one thought to not be as powerful) ergo, the Xbox one is surprising with its power meaning they will be "consoles pretty much on parity ".... one sticking up a little was the difference on PS4 having a slight edge over xbone but not as much as raw specs have led people to believe

I never thought about it this way. Makes sense.
 

BigJoeGrizzly

Neo Member
It's amazing how some of you can truly think you know more (or better) than the people working (and engineering) for these companies. Let alone websites like ExtremeTech, that specialize in knowing about computer tech in extreme detail. The "they are absolutely wrong" statements made by some here really amuse me.
 

Klocker

Member
However, all of this is no secret sauce. It is still "just" memory. It is there to provide additional bandwidth that the DDR3 in itself just can't provide. It's downside in comparison to the PS4 is that this pool is just 32MB big whereas the PS4 has more bandwidth on the entire memory pool. 32MB are enough to store the most important pixelbuffers. Nevertheless, you can still saturate the pool easily if you are using deferred (two-pass) rendering which uses multiple "auxiliary" buffers for storing information gathered in the first pass. KZ:SF, for instance, uses 39MB at 1080p for pixelbuffers alone.

I can't get into a full on tech discussion with you as you know more than I but the implication from the feedback trickling in is that it possibly makes up a fair margin of the original memory speed deficiency perception. As was its intention from the outset as you noted. You are correct about the framebuffer but there may be ways that MS and devs will get creative to find all kinds of unique solutions to leverage the architecture.

Devs are smart. ;)


But "having similar onion , garlic bus to ps4" is new information to what we knew last week even if we don't know all the details of its operation yet
 

Kalren

Member
Lord Cerny says it won't happen until year 3 or 4. Will just have to see what 2017 brings us.

This is not a correct interpretation of what Cerny said. The context of Cerny's Year 3 or 4 quote is that there are modification that his team made to the APU that will not be effectively utilized until year 3 or 4.

The base power of the PS4 is ~40% greater of the X1 and it will be reflected in the games that are released even at launch.
 

KidBeta

Junior Member
It's amazing how some of you can truly think you know more (or better) than the people working (and engineering) for these companies. Let alone websites like ExtremeTech, that specialize in knowing about computer tech in extreme detail. The "they are absolutely wrong" statements made by some here really amuse me.

Well they are most certainly wrong about some things.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
I can't get into a fill on tech discussion with you as you know more than I but the implication from the feedback trickling in is that it possibly makes up a fair margin of the original memory speed deficiency perception.

Sure, otherwise it wouldn't be there. The point is that the overall system is still less flexible, so I wouldn't advertise it as a performance benefit. It's a good mitigation of a performance deficit, but it came at the cost of sacrificing 40% GPU ALU.

But "having similar onion , garlic bus to ps4" is new information to what we knew last week even if we don't know all the details of its operation yet

No, they are just using the names that the respective PS4 buses have, but the information itself is not new:

durango_memory.jpg


http://www.vgleaks.com/durango-memory-system-overview/

They are talking about the 30GB/s bus between GPU and NB and the direct 68GB/s bus between GPU and main memory.
 

Bundy

Banned
This is not a correct interpretation of what Cerny said. The context of Cerny's Year 3 or 4 quote is that there are modification that his team made to the APU that will not be effectively utilized until year 3 or 4.

The base power of the PS4 is ~40% greater of the X1 and it will be reflected in the games that are released even at launch.
Amen

But "having similar onion , garlic bus to ps4" is new information to what we knew last week even if we don't know all the details of its operation yet
Not it isn't! They've just used the "PS4 names".
 

Klocker

Member
No, they are just using the names that the respective PS4 buses have, but the information itself is not new:
.

Amen


Not it isn't! They've just used the "PS4 names".



Ok, well new to me as I understood onion/garlic bus was proprietary ps4 tech.

But maybe I was jut not paying enough attention to have heard otherwise :)

Edit.....

onion+ / selectively invalidating cache lines / more queues is PS4 only.

There is proprietary tech in the form of the additional "Onion+" bus which is what Cerny is referring to in this quote:



http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2

We are sure that xbone can not do?
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Ok, well new to me as gaf told me onion/garlic bus was proprietary ps4 tech.

There is proprietary tech in the form of the additional "Onion+" bus which is what Cerny is referring to in this quote:

"First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2
 

KidBeta

Junior Member
Ok, well new to me as I understood onion/garlic bus was proprietary ps4 tech.

But maybe I was jut not paying enough attention to have heard otherwise :)




We are sure that xbone can not do?

Well its specifically described as modification Sony made, so if so someones in trouble.
 
Anyone who really thought the Xbox One was going to launch $100 more and significantly weaker was just being dumb or a crazy fanboy. It may, in fact, be weaker but it would never be so much so that third party games would look different.

so you're completely disregarding the fact that PS4 is proven to have better performance by ~40% because of released known specs of both systems, and that X1 is $100 more because of kinect? because fanboys?
 

USC-fan

Banned
Anyone who really thought the Xbox One was going to launch $100 more and significantly weaker was just being dumb or a crazy fanboy. It may, in fact, be weaker but it would never be so much so that third party games would look different.
Lol joke post?
 

dude819

Member
so you're completely disregarding the fact that PS4 is proven to have better performance by ~40% because of released known specs of both systems, and that X1 is $100 more because of kinect? because fanboys?

It has better performance if you simply look at the parts used and their individual performance on paper. Clearly, based on this article, MS is doing some trickery (cue unoriginal hacks for "secret sauce" jokes) to get similar performance out of lesser parts. Specifically this huma/shared memory stuff.

As far as I can tell, it's like saying the PS4 is a vacuum and the Xbox One is an automated broom. Same idea but one is straight horsepower and the other is some bullshit MS cooked up.

Who know? The Xbox One could be straight trash or the PS4 could melt in its tiny casing on day 3. But, as far as I have seen, they are about the same.
 

heelo

Banned
I think Yoshida jumped on the quote but the implication in my comprehension was one of them is "surprisingly more powerful than we thought" (xbone was the one thought to not be as powerful) ergo, the Xbox one is surprising with its power meaning they will be "consoles pretty much on parity ".... one sticking up a little was the difference on PS4 having a slight edge over xbone but not as much as raw specs have led people to believe

I think this is easily the most reasonable interpretation of the quote.

If you were to take it otherwise, i.e. that PS4 was surprisingly more powerful than they thought and that PS4 will slightly better performing than XB1, then the logical conclusion would be that they initially thought the XB1 was more powerful than the PS4. That's an unreasonable conclusion, based on nothing more than RAM type and GPU specs.
 

KidBeta

Junior Member
It has better performance if you simply look at the parts used and their individual performance on paper. Clearly, based on this article, MS is doing some trickery (cue unoriginal hacks for "secret sauce" jokes) to get similar performance out of lesser parts. Specifically this huma/shared memory stuff.

As far as I can tell, it's like saying the PS4 is a vacuum and the Xbox One is an automated broom. Same idea but one is straight horsepower and the other is some bullshit MS cooked up.

Who know? The Xbox One could be straight trash or the PS4 could melt in its tiny casing on day 3. But, as far as I have seen, they are about the same.

but both are huma/have shared memory....
 

dude819

Member
but both are huma/have shared memory....

PS4 has the real (marketing gimmick) huma setup. MS basically built their own version.

So they both do the same thing but the Internet would have you believe this is another giant strike against the Xbox One.
 

vpance

Member
I think this is easily the most reasonable interpretation of the quote.

If you were to take it otherwise, i.e. that PS4 was surprisingly more powerful than they thought and that PS4 will slightly better performing than XB1, then the logical conclusion would be that they initially thought the XB1 was more powerful than the PS4. That's an unreasonable conclusion, based on nothing more than RAM type and GPU specs.

The surprise is for the PS4, since 3rd parties always speak like both are equal. It's unlikely to be Xbone because of the state of its SDK (CBOAT rumors, Avlanche devs, etc)
 

KidBeta

Junior Member
Can someone please explain the difference between what esram being a cache or scratchpad means and why the distinction is so important?

scratchpad . http://en.wikipedia.org/wiki/Scratchpad_memory

Scratchpad memory (SPM), also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a microprocessor ("CPU"), scratchpad refers to a special high-speed memory circuit used to hold small items of data for rapid retrieval.
It can be considered similar to the L1 cache in that it is the next closest memory to the ALU after the internal registers, with explicit instructions to move data to and from main memory, often using DMA-based data transfer. In contrast with a system that uses caches, a system with scratchpads is a system with Non-Uniform Memory Access latencies, because the memory access latencies to the different scratchpads and the main memory vary. Another difference with a system that employs caches is that a scratchpad commonly does not contain a copy of data that is also stored in the main memory.

cache.

In computer science, a cache (/ˈkæʃ/ kash)[1] is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere. If requested data is contained in the cache (cache hit), this request can be served by simply reading the cache, which is comparatively faster. Otherwise (cache miss), the data has to be recomputed or fetched from its original storage location, which is comparatively slower. Hence, the greater the number of requests that can be served from the cache, the faster the overall system performance becomes.

Bolded is the important difference.
 

Hollow

Member
But, as far as I have seen, they are about the same.

I think you'll have to wait till November and we can actively compare the same games running on both platforms before you can make this kind of argument.
The specs favor the PS4, though.

OT - This article just makes it look like Microsoft has been just as efficient with their busses as Sony has.
 

Klocker

Member
The surprise is for the PS4, since 3rd parties always speak like both are equal.


Sorry, that's a stretch as it requires a third unrelated tangent (assuming what third parties will or won't say) that is completely absent from the quote
 
It has better performance if you simply look at the parts used and their individual performance on paper. Clearly, based on this article, MS is doing some trickery (cue unoriginal hacks for "secret sauce" jokes) to get similar performance out of lesser parts. Specifically this huma/shared memory stuff.

As far as I can tell, it's like saying the PS4 is a vacuum and the Xbox One is an automated broom. Same idea but one is straight horsepower and the other is some bullshit MS cooked up.

Who know? The Xbox One could be straight trash or the PS4 could melt in its tiny casing on day 3. But, as far as I have seen, they are about the same.

what? the hUMA/shared memory stuff is on PS4, and there are conflicting articles on wether X1 has it. On a purely hardware standpoint, I don't see how they are about the same at all.

Xbox One:
1.31 TFLOPS
40.9 GTex/s
13.6 GPix/s
68GB/s DDR3
109GB/s eSRAM

PS4:
1.84 TFLOPS (+40%)
57.6 GTex/s (+40%)
25.6 GPix/s (+90%)
176GB/s GDDR5
 

Klocker

Member
scratchpad . http://en.wikipedia.org/wiki/Scratchpad_memory



cache.



Bolded is the important difference.

Edit... oh and thanks!

So do I understand that if it's a scratchpad it will require a bit more programming into the esram where if it is a cache it is able to do it automatically?

And then of course next question is if the computer reporters are all referring to this hot chips discussion as cache, how can we be so sure it is not?
 

grumble

Member
It has better performance if you simply look at the parts used and their individual performance on paper. Clearly, based on this article, MS is doing some trickery (cue unoriginal hacks for "secret sauce" jokes) to get similar performance out of lesser parts. Specifically this huma/shared memory stuff.

As far as I can tell, it's like saying the PS4 is a vacuum and the Xbox One is an automated broom. Same idea but one is straight horsepower and the other is some bullshit MS cooked up.

Who know? The Xbox One could be straight trash or the PS4 could melt in its tiny casing on day 3. But, as far as I have seen, they are about the same.

The ps4 is as of right now looking to be significantly more powerful. They will not be at parity.

Also, the hacks that ms is using are also being done by Sony. We'll see what happens in the end, but I'd err on the side of guessing you're engaging in some wishful thinking.
 

KidBeta

Junior Member
So do I understand that if it's a scratchpad it will require a bit more programming into the esram where if it is a cache it is able to do it automatically?

And then of course next question is if the computer reporters are all referring to this hot chips discussion as cache, how can we be so sure it is not?

Because you don't texture into a cache, you as a programmer (outside of some obtuse BIOS and osdev work) don't tell it what to do, nor what to store, it's use is 'transparent'.

Also the cache wouldn't take up address space, which the eSRAM does.
 

gofreak

GAF's Bob Woodward
So do I understand that if it's a scratchpad it will require a bit more programming into the esram where if it is a cache it is able to do it automatically?

And then of course next question is if the computer reporters are all referring to this hot chips discussion as cache, how can we be so sure it is not?

Well, if it was a cache, you wouldn't have Data Move Engines for starters. It would also be possible for it to be coherent with memory in the rest of the system but the diagrams make it clear it's not.
 

Skeff

Member
So to summarize, does it make games look better, play better, or make the process of making a game easier?

Look Better: In this case probably not, but possible although unlikely due to the set ups of both consoles.

Play Better: Yes, this will likely improve CPU limited functions by using GPGPU in a more efficient manor allowing for more advanced GPGPU algorithms compared to the alternative CPU only algorithms.

Making easier: Yes.
 

astraycat

Member
Can someone please explain the difference between what esram being a cache or scratchpad means and why the distinction is so important?

Caches are special bits of hardware that sit between main memory and whatever is consuming data. When data is requested the caches are checked first by the hardware, and if they are not in the cache then they will go up the cache chain and then finally to main memory. They are automatic and transparent to the programmer (but that doesn't mean that programmers ignore the fact that they're there).

A scratchpad is programmer-managed. If a programmer wants to cache some data there, he or she will have to manually copy the data there from main memory. This is only worthwhile if the scratchpad has some superior quality (bandwidth, latency, etc.) over main memory. ESRAM theoretically has better bandwidth and maybe better latency.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Can someone please explain the difference between what esram being a cache or scratchpad means and why the distinction is so important?

"Scratchpad" is not really a technical term. What people mean by that is a small pool of freely usable memory. In this case, ESRAM is 32MB of fast memory that can be accessed by the GPU (that is, by programs running on the GPU) at will. The GPU can read and write any data it wants from and into that pool.

A cache is also a memory pool. However, caches are managed by hardware and transparent to software running on the processor. Software doesn't even know explicitly that the cache exists. A cache "mirrors" data that is stored in main memory according to a cache management algorithm. The purpose of that is that (1) caches are much, much faster than main memory and (2) applications usually don't access memory in a totally random way but tend to perform many subsequent reads/writes on the same coherent "block" of data. Hence, it makes sense to copy the block that is currently used by the application being executed on a processor into a very fast cache. When the application is done with that block, it is copied back into main memory. The algorithms that manage caches try to predict which blocks of data will be relevant to the application and which won't in order to always swap the least relevant block of data with the most relevant block of data. Those algorithms don't have knowledge about what the application will actually do but rely on predictions. The benefit is that the application (i.e. the programmer) does not have to manage the cache manually. The performance boosts provided by caches are, thus, not necessarily optimal, but they are general and benefit every application.

Caches are subdivided into "cache lines" which are the atomic unit of "blocks" of data that are managed by a cache. I think for Jaguar the size of a cache line is 64 bytes. Cache-coherency means that if multiple caches are mirroring the same space of memory, those caches are always synchronized. That means if processor A using cache A writes data to a certain memory address and processor B using cache B reads from that memory address, processor B will get the actual data that processor A wrote. Without cache-coherency, the update performed by processor A might still be "stuck" in cache A and not be visible to cache B and, hence, to processor B.

Addressable pools, like the ESRAM or main memory, have to be managed by the application.
 

The Flash

Banned
Look Better: In this case probably not, but possible although unlikely due to the set ups of both consoles.

Play Better: Yes, this will likely improve CPU limited functions by using GPGPU in a more efficient manor allowing for more advanced GPGPU algorithms compared to the alternative CPU only algorithms.

Making easier: Yes.

Cool. Wish I was hUMA.
 
Top Bottom