mbetke wrote:But even the Nvida iray team guys say there will not be same kepler performance like on fermi.
And the new Kepler pro-boards are for double-precision only.
the bad thing is that it seems you guys have to re-code half your renderer if nvidia decides to change something (to their advantage).
I wouldn't say that. The recent work to have good support of Keplers and Fermi in one build was a pain, but it was worth it. It gives us more freedom when it comes to building CUDA code for different architectures and devices plus a few other benefits. We didn't change anything in the actual algorithms. It was more a build issue, but it should be sorted now
Obviously what I said about coherency on Keplers still applies, which is a bit more complicated to explain. Here is my personal view of all this:
If you look at Fermi you can see that it was leaning very much towards GPGPU computing. Actually it was fairly revolutional, but brought NVIDIA lots of delays, manufacturing and heat problems. You can see that with the GTX 480 which had 16 multi-processors, but only 15 enabled. It also earned strong criticism from the gaming community, since the performance didn't really match the price. And the gaming community is much larger than the GPGPU community, thus a lot more important, and it doesn't care how fast Octane renders as long as they can't play <pick your favourite game> smoothly in HD with full details.
So they brought out the second generation of Fermi, which solved most of NVIDIA's technical problems, but having basically the same architecture as the first generation, there wasn't much improvement in the field of gaming. All the while ATI was happily selling lots of Radeons which were a lot less suitable for GPGPU than Fermis, but worked great in games at reasonable prices.
Now, NVIDIA had to do something about that and the answer is Kepler. It's leading in most of the gaming benchmarks. Unfortunately they achieved this by chopping the number of multiprocessors in half and increasing the number of cores per multiprocessor by a factor of 6, i.e. having 3x as many cores in total. This is great for games that use rasterizing (i.e. all games), which is probably the most coherent algorithm you can think of, but it's not so great for GPGPU computing applications which often run fairly incoherent stuff.
Where does that leave us? We will see. The fact that the vanilla Octane render algorithms run on a 680 roughly as fast as one 590 GPU and with a much lower energy consumption gives me hope. I think, we also have to accept that the area of GPGPU computing is under heavy construction on the software as well as on the hardware side. This means there will be alot of progress in the coming years, but it also means that we very likely will have to adapt part of our rendering code to new architectures and new concepts. That can be tedious, but also fun
Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra