Gtx 470/480 - 2.4 pre beta speed

Tue Jan 18, 2011 10:05 am

Could somebody plz put here perf. difference between 2.4 and 2.3 on gtx 470 or gtx 480 ?

Wed Jan 19, 2011 10:05 am

Here are some tests..
And yes there is a speed improvement

Wed Jan 19, 2011 10:59 am

**STK** wrote:Here are some tests..
And yes there is a speed improvement

alpha shadows off ?

Wed Jan 19, 2011 12:07 pm

Yes. I just open the OctaneBenchmark scene and hit render

Wed Jan 19, 2011 12:20 pm

Directlighting mode not Pathtracing for now
I will test it with pathtracing to.

Wed Jan 19, 2011 12:46 pm

Ok here are the results with pathtracing..
Still there is speed improovement in both cases with and without alphashadows.

Wed Jan 19, 2011 1:05 pm

gtx470 has speed improvement about 20% with same number of CUDA cores,
gtx460 has speed improvement about 20% with 112 added cores

it does not make sense to me...

Wed Jan 19, 2011 1:16 pm

oswald wrote:gtx470 has speed improvement about 20% with same number of CUDA cores,
gtx460 has speed improvement about 20% with 112 added cores

it does not make sense to me...

+1

Wed Jan 19, 2011 1:44 pm

We found out why. It is because GTX 460 (GF 104) has different architecture. Look here:

http://www.anandtech.com/show/3809/nvid ... 200-king/2

One thing we haven’t discussed up until now is how an SM is internally divided up for the purposes of executing instructions. Since the introduction of G80 in 2006, the size of a warp has stayed constant at 32 threads wide. For Fermi, a warp is executed over 2 (or more) clocks of the CUDA cores – 16 threads are processed and then the other 16 threads in that warp are processed. For full SM utilization, all threads must be running the same instruction at the same time. For these reasons a SM is internally divided up in to a number of execution units that a single dispatch unit can dispatch work to:

16 CUDA cores (#1)
16 CUDA cores (#2)
16 Load/Store Units
16 Interpolation SFUs (not on NVIDIA's diagrams)
4 Special Function SFUs
4 Texture Units
With 2 warp scheduler/dispatch unit pairs in each SM, GF100 can utilize at most 2 of 6 execution units at any given time. It’s also because of the SM being divided up like this that it was possible for NVIDIA to add to it. GF104 in comparison has the following:

16 CUDA cores (#1)
16 CUDA cores (#2)
16 CUDA cores (#3)
16 Load/Store Units
16 Interpolation SFUs (not on NVIDIA's diagrams)
8 Special Function SFUs
8 Texture Units
This gives GF104 a total of 7 execution units, the core of which are the 3 blocks of 16 CUDA cores.

picture

With 2 warp schedulers, GF100 could put all 32 CUDA cores to use if it had 2 warps where both required the use of CUDA cores. With GF104 this gets more complex since there are now 3 blocks of CUDA cores but still only 2 warp schedulers. So how does NVIDIA feed 3 blocks of CUDA cores with only 2 warp schedulers? They go superscalar.

In a nutshell, superscalar execution is a method of extracting Instruction Level Parallelism from a thread. If the next instruction in a thread is not dependent on the previous instruction, it can be issued to an execution unit for completion at the same time as the instruction preceding it. There are several ways to extract ILP from a workload, with superscalar operation being something that modern CPUs have used as far back as the original Pentium to improve performance. For NVIDIA however this is new – they were previously unable to use ILP and instead focused on Thread Level Parallelism (TLP) to ensure that there were enough warps to keep a GPU occupied.

So, GTX 460 has another 16 CUDA cores in one block but efficiency of theese cores is dependent on character of the application (like intel Hyper-threading in Games is almost useless)

RESULT: CUDA 3.2 only showing another 112 cores, but only showing, this architecture cannot us all 3 blocks in this version octane, maybe radiance will find out how to do it

Thu Jan 20, 2011 4:06 am

Wow thanks for the info.
On the topic of cards. Tigerdirect has GTX 460 on sale for $109 USD.
I recently bought a system with GTX480 and am waiting for it to arrive. Can i use the 460 with a 480 in SLI mode.