David Kirk has an insatiable need for power. Graphics rendering power, that is. He’s fond of saying, “Why use a screwdriver when you can use a sledgehammer?” As nVidia’s Chief Scientist, Kirk has overseen development of architectures including the original GeForce, GeForce 2, GeForce 3,
We put ten questions to Dr. Kirk—all but one asked by ExtremeTech readers. Topics range from HDR to open-source drivers and everything in between, and this array of interesting questions from you the readers drew equally interesting answers from our guest guru. At the end of each question is the forum handle of the questioner.
Reader Question: I’ve read at a number of sites that the video processor on the 6 series GPUs was “somewhat broken,” but I’ve never heard any specifics. If you’re willing to, could you share with us exactly how this processor was broken? I just picked up a 6800 GT, so this would be very nice to know.—felraiser
David Kirk: Well, the first thing that I would say about “I’ve read at a number of sites…” is that you shouldn’t believe everything that you read! The video processor of the series 6 GPUs is just that—a processor—and it runs software for video encoding, decoding, and image processing and enhancement. The GeForce 6 series video processor is a new processor, the first of its kind, and there was no legacy (pre-GeForce 6) code to run on it. At the time the original GeForce 6800 shipped, very little new code had been written for the new processor. So, for early GeForce 6800 customers, there was little or no improvement in video quality, performance, or reduction of CPU usage.
As time goes on, more and more software is written and optimized, so that more and more video functions are enabled (turned on). Newer drivers show better performance and quality on video encoding and decoding tasks. In fact, recent reviews show the video quality of the GeForce 6 video processor to be comparable to the quality of consumer electronics video devices. That’s a first for a PC video product. Also, each successive product in the GeForce 6 family has the benefit of more learning and more development time, so that we can continue to improve the processor design, instruction set, and performance. So, you can expect to see continued improvements.
As to the GeForce 6800 video processor being “broken,” I wouldn’t say that.
Reader Question: I’ve read that the ATI GPUs are based on a licensed MIPS RISC core. I’ve never read anything similar about the nVidia GPUs. I do recall reading that the NV3x series was based on a VLIW design, but with no supporting information. Did you license a DSP core for NV3x? What type of core does the NV4x use? RISC, VLIW? It would be nice to see this kind of information put out by nVidia, just for us gearheads who like to know this kind of stuff.—TheHardwareFreak
David Kirk: It’s interesting to hear that ATI’s GPUs are based on a MIPS RISC core. I had not heard that before. I think that the task of rendering triangles, vertices, and pixels is very different than the tasks of general-purpose computing, as implemented by a RISC or x86 core, and I find it hard to imagine how to make a graphics processor based on one of those technologies go fast at graphics tasks.
The NV3x shader core was a fully original design, created to be good at floating-point pixel shading. The NV4x core is the second generation of that design, optimized for greater parallelism and throughput. Terms like VLIW describe older, simpler architectures and don’t really apply too well to the kind of processors that we build now.
The shader cores on NV3x and NV4x process groups of pixels, and multiple channels of data per pixel, as in red, green, blue, and alpha, not to mention Z and stencil. So, is it scalar? Certainly, not. Is it vector? Yes, in a way, but not purely vector—all of the data processed is not a single vector. Each pixel is a vector, but multiple pixels, and multiple groups of pixels, can be processed differently. Also, for each cycle, in each pixel pipeline, multiple operations can happen in parallel—plus texture and special math operations. We talked a bit about this at Editors’ Day for the GeForce 6800, and there’s not too much more we’re willing to reveal at this point.
Reader Question: GeForce FX was the first product to be created utilizing technology from the 3Dfx buyout. How far in advance are GPUs planned out? A timeline of product development for NV40, or something else would be appreciated.—adg1034
David Kirk: The timeline for GPU development is about 1.5 to 2 years from concept to delivery. We have multiple projects in development at any given time, and people working on two or three overlapping generations at a time. NV40 started approximately 3 years ago. At the peak of development, as many as 500 engineers may be working on a product.
Reader Question: Is TurboCache going to be implemented in the near future with all of your GPUs, and if so, will the GPU have a limit to how much system memory it can consume? Or is it possible to unrestrictedly increase GPU performance with an increasing amount of system memory?—cfee2000
David Kirk: I believe that TurboCache is a wonderful piece of technology and can be beneficial to GPUs throughout the product family, from value all of the way to enthusiast levels of performance. It’s just a beautiful thing to take advantage of all of that PCIe bandwidth, wherever it exists!
GPU performance is not so much limited by the amount of memory, but more often by the amount of memory bandwidth, and the TurboCache access of system memory across the PCIe bus takes advantage of that additional bandwidth. Can this process continue indefinitely? Well, hopefully both local graphics memory bandwidth and, over time, PCIe bandwidth will continue to increase and give more available bandwidth to GPUs. For TurboCache in the value segment, the biggest benefit is price/performance—using the PCIe bandwidth instead of additional local memory gives outstanding value at a lower price.
Reader Question: Will we see non-uniform rasterization, streaming ray-casts, or equivalent features to enable the kind of graphics we really want—real-time, dynamic lighting with a large number of lights?—Jason_Watkins
David Kirk: Yes.
Over time, everything that has been hardwired and fixed-function will become general-purpose. This will enable much more variety in graphics algorithms and ultimately, much more realism.
The good news, for my job security, is that graphics is still ultimately very very hard. Tracing streaming rays in all directions, reflecting between all of the objects, lights, and fog molecules in parallel is extremely hard. Nature does it . . . in real time! However, nature does it by using all of the atoms in the universe to complete the computation. That is not available to us in a modern GPU :-).
Graphics will continue to be a collection of clever tricks, to do just enough work to calculate visibility, lighting, shadows, and even motion and physics, without resorting to brutishly calculating every detail. So, I think that there’s a great future both in more powerful and flexible GPUs as well as ever more clever graphics algorithms and approaches.
Reader Question: Open-source and non-Microsoft platforms are growing in size and stature. What are you going to do to improve your standing amongst developers working in these areas? Even if existing policy requires that you protect your latest hardware research, why aren’t we at least getting the specs for hardware from previous years? Or why not work with developers and release a standard HAL API that hides the details but allows us to talk to your hardware in a low-level way for creating platform-specific drivers?—pmanias
David Kirk: nVidia supports open-source software platforms. Linux is a small fraction of a percent of our user base, and we devote far more energy and effort to making robust and performant Linux drivers than that. My sense is that developers on those platforms are quite happy with our efforts, particularly considering the alternatives!
As graphics hardware becomes more programmable—and easier to program—it makes more sense for us to release more interface and instruction set information, and I expect that will happen more and more over time. In the past, the interfaces to the graphics hardware have been too arcane and complex to even describe, let alone publish for public consumption. There is also a security issue—it’s possible for hackers to take bad advantage of raw hardware interfaces.
Reader Question: With all of the pressing towards more powerful graphics cards to handle features such as FSAA and anisotropic filtering, why do we still use inefficient, “fake” methods to achieve these effects?—thalyn
David Kirk: Looking at my answer from the question about ray-casting and lighting effects, graphics is all “fake” methods. The trick is to perform a clever fake and not get caught! All graphics algorithms do less work than the real physical universe but attempt to produce a realistic simulation.
I’ll discuss FSAA and Anisotropic filtering separately.
FSAA—full scene antialiasing—implies that the full scene is completely correctly sampled. In other words, even if little tiny slivers of triangles 1/1000th of a pixel wide cover the screen, each pixel’s color will be correct as if you exactly correctly divided each pixel into the area covered by each little triangle and integrated the colors. Of course, this isn’t practical, and it isn’t how any hardware or software works. We approximate that result using a technique known as “point sampling.” We evaluate, or sample, each pixel in one or more points, and assume that will be a good estimate.
As an aside, relative to your question, this is not an “inefficient” fake method—this is an extremely efficient fake method! It’s a lot less work to do a small amount point sampling than actually evaluating the area across a potentially infinite number of points. Given that we’re willing to sample more than one point per pixel, it’s also efficient in hardware, and faster, to assume that you can sample these points (2, 4, or more) at the same time, and process them together. This is part of the cleverness of multisample antialiasing. In the long term, I hope that display resolutions will get large enough that the amount of sampling will become less important. In the short term, though, I believe that FSAA as we know it is here to stay.
Now, on to anisotropic filtering. You know, trilinear filtering was good enough for the best of the best flight simulators for many many years, and every time you or I have been in an airplane, the pilot flying it was probably trained in a simulator that used trilinear filtering! Anisotropic filtering is a relatively subtle effect of texture filtering, allowing severely oblique-angled textures to look both sharp and smooth at the same time. It’s not really a “fake” method—it’s a better estimate of the “perfect” filtering than either bilinear or trilinear filtering. And, there are still better techniques beyond anisotropic filtering, but they require even more effort and hardware.
Reader Question: What is your opinion about some the new graphical features that are being implemented in games? Some are quite beneficial to GPU performance, such as normal-map compression and virtual displacement mapping. But others are very costly to performance, specifically high-dynamic-range lighting. After seeing the extreme over-saturation of light with HDR in Far Cry (even on the lower levels of HDR) and the performance hit it took, I personally am not convinced that HDR is a method that should be pursued any longer. What are your opinions on this subject?—cfee2000
David Kirk: I think that High Dynamic Range Lighting is going to be the single most significant change in the visual quality over the next couple of years. It’s almost as big as shading.
The reason for this is that games without HDR look flat. They should, since they are only using a range of 256:1 in brightness—a small fraction of what our eyes can see. Consequently, low-dynamic-range imagery looks flat and featureless, no highs, and no detail in the shadows, the lows. If you game using a DFP (LCD display), you probably can’t tell the difference anyway, since most LCD displays only have 5 or 6 bits of brightness resolution—an even narrower 32:1 or 64:1 range of brightness. On a CRT, you can see a lot more detail, and on the newer high-resolution displays, you can see not only the full 8 bits, but even more. There are new HDR displays that can display a full 16-bit dynamic range, and I can tell you that the difference is stunning. When these displays become more affordable in the next year or two, I don’t know how we’ll ever go back to the old way.
Reader Question: Why is there so much under-utilization of card features along with game technology, e.g., shader model 2.0 to 2.0b or 3.0? Is it justified to bring out a new card every few months just for bragging rights?—mkashif
David Kirk: It takes a while for developers to learn to use the new features in the latest and greatest APIs and GPUs. That’s why we see close to a year between when the new features come out and when the games really take advantage of them.
There are a couple of reasons to keep pushing the performance and features of the state of the art. We want to continue to provide a better user experience, and also to give the hardest of the hardcore games the best that we can do at any given time. Also, we want to continue to push the state of the art in graphics, and if we don’t push ahead of where the game developers are today, they won’t be challenged to improve tomorrow. We have to take the plunge and count on the developers to follow us into the swamp!
Question: Are GPU architectures and Direct3D evolving toward a design where the distinction between vertex and pixel shaders essentially goes away?—davesalvator
David Kirk: For hardware architecture, I think that’s an implementation detail, not a feature.
For sure, the distinction between the programming models and instruction sets of vertex shaders and pixel shaders should go away. It would be soooo nice for developers to be able to program to a single instruction set for both.
As to whether the architectures for vertex and pixel processors should be the same, it’s a good question, and time will tell the answer. It’s not clear to me that an architecture for a good, efficient, and fast vertex shader is the same as the architecture for a good and fast pixel shader. A pixel shader would need far, far more texture math performance and read bandwidth than an optimized vertex shader. So, if you used that pixel shader to do vertex shading, most of the hardware would be idle, most of the time. Which is better—a lean and mean optimized vertex shader and a lean and mean optimized pixel shader or two less-efficient hybrid shaders? There is an old saying: “Jack of all trades, master of none.”