Behind the scenes with ATI - Eric Demers interview

Register array, Ring Bus

PH!: For every thread of the R520 there are 32 registers, each being 128 bits wide. This means a 256KB register array for 512 threads, which seems like quite a huge number. Is it really “this big” or did we make a mistake in our calculations? How does this number compare to former ATI or NVIDIA architectures? Is this 2 MB really that big and if so, how much space does it take up on the ASIC? How fast are these registers compared to L1 and L2 caches of today's CPUs?

ED: Yes, that's the theoretical number. The actual number can be less, if each of your threads does more work before being put to sleep, you can reduce the number of threads without affecting your latency hiding capabilities. How much less depends on your architecture, your ALU/Texture ratio and many other factors.

Hirdetés

Compared to previous parts from ATI, it's significantly more. The shader core is getting to be one of the bigger parts of the design now. Compared to NV? Well, since I don't know how they've done things, it's hard for me to compare J

As far as compared to L1/L2 caches, well, these registers are more like the immediate registers of a CPU. If you check the GPGPU results, you'll see that our performance is independent of the number of GPRs in use, so it's exactly the same a CPU's registers. I know that that's not true of other architectures, but it's an important aspect of ours. That's also one of the prime reasons why we are so well suited for GPGPU work: as the shaders get more complex and longer, our performance is perfectly predictable. No need for fancy driver shader games, or falling back to partial precision.

PH!: The Ring Bus memory controller is the most elegant part of the R520 in our opinion. We are looking forward to hearing of (and testing the) new tweaks. We saw the OpenGL optimization that boosted frame rates by 30%, which is a huge number. How does the fine tuning work? What can the controller logic do by itself, and what can the coders do through the driver? What clock speeds can be reached by this Ring Bus architecture? It is possible to extend it to 2x512 bits?

ED: The Memory controller was designed for the fastest GDDR4, or at least 1.5GHz. It's easy to scale down the design, or scale it up. It's all linear. It was designed to be flexible in those ways, since we have so many different products. The fine tuning has some elements of trial and error, but we use annealing and DNA algorithms to reduce the space and zero-in on settings. Applications are complex, and getting to the “perfect” MC settings is difficult. We have an application team in place, whose job it is to improve performance, both through MC tuning as well as driver changes. We have an amazing amount of MC tuning potential at our disposal. So much, that to go through all possibilities would take us longer than the life of the universe so far. However, with some educated guesses and clever reduction algorithms, we can manage the task. You should expect to see improvements across the board and in specific games, over time. We won't always tell you if the performance improvements came from MC tuning or some other driver enhancement. But it's a powerful new tool for us to improve performance.

PH!: As far we know, the early production problems only concerned products having the Ring Bus memory controller. Is there any connection between the issue and the memory controller itself?

ED: no, the early production problems has nothing to do with any specific part of the design. It wasn't the MC or graphics or any other place. It was all over the place. There was a design flaw in a circuit that did not show up in any of the checks that we do in the process of producing ASICs. It was internal to a non-ATI design. Once we found the problem, it was trivial to fix, but it delayed our products many months.

A cikk még nem ért véget, kérlek, lapozz!

  • Kapcsolódó cégek:
  • ATi

Azóta történt

Előzmények