Gtx 470/480 - 2.4 pre beta speed

A forum where development builds are posted for testing by the community.
Forum rules
NOTE: The software in this forum is not %100 reliable, they are development builds and are meant for testing by experienced octane users. If you are a new octane user, we recommend to use the current stable release from the 'Commercial Product News & Releases' forum.
Post Reply
t0m4sk0
Licensed Customer
Posts: 34
Joined: Wed Oct 20, 2010 7:57 pm
Location: Slovakia

Could somebody plz put here perf. difference between 2.4 and 2.3 on gtx 470 or gtx 480 ?
1 x GTX 460 2GB, Core i3 @ 3,7Ghz, 4GB Ram, Win 7 64-bit, 260.99 WHQL
**STK**
Licensed Customer
Posts: 16
Joined: Sat May 29, 2010 12:04 am

Here are some tests..
And yes there is a speed improvement
Attachments
scene from OctaneBenchmark with pre23_5
scene from OctaneBenchmark with pre23_5
scene from OctaneBenchmark with pre24_1
scene from OctaneBenchmark with pre24_1
Dual Xeon E5540 CPU 2.53 - 1x GTX470 - win7 64 -12GB ram
t0m4sk0
Licensed Customer
Posts: 34
Joined: Wed Oct 20, 2010 7:57 pm
Location: Slovakia

**STK** wrote:Here are some tests..
And yes there is a speed improvement
alpha shadows off ?
1 x GTX 460 2GB, Core i3 @ 3,7Ghz, 4GB Ram, Win 7 64-bit, 260.99 WHQL
**STK**
Licensed Customer
Posts: 16
Joined: Sat May 29, 2010 12:04 am

Yes. I just open the OctaneBenchmark scene and hit render
Dual Xeon E5540 CPU 2.53 - 1x GTX470 - win7 64 -12GB ram
**STK**
Licensed Customer
Posts: 16
Joined: Sat May 29, 2010 12:04 am

Directlighting mode not Pathtracing for now
I will test it with pathtracing to.
Dual Xeon E5540 CPU 2.53 - 1x GTX470 - win7 64 -12GB ram
**STK**
Licensed Customer
Posts: 16
Joined: Sat May 29, 2010 12:04 am

Ok here are the results with pathtracing..
Still there is speed improovement in both cases with and without alphashadows.
Attachments
pathtracing_intermediate_pre24_1.jpg
pathtracing_intermediate_pre23_5.jpg
pathtracing_alphashadows_intermediate_pre24_1.jpg
Dual Xeon E5540 CPU 2.53 - 1x GTX470 - win7 64 -12GB ram
User avatar
oswald
Licensed Customer
Posts: 2
Joined: Thu Oct 21, 2010 6:25 pm
Location: Slovakia
Contact:

gtx470 has speed improvement about 20% with same number of CUDA cores,
gtx460 has speed improvement about 20% with 112 added cores

it does not make sense to me...
|DualXeon5130|10GB|2xGTX460 2GB|
GeoPappas
Licensed Customer
Posts: 429
Joined: Fri Mar 26, 2010 5:31 pm

oswald wrote:gtx470 has speed improvement about 20% with same number of CUDA cores,
gtx460 has speed improvement about 20% with 112 added cores

it does not make sense to me...
+1
t0m4sk0
Licensed Customer
Posts: 34
Joined: Wed Oct 20, 2010 7:57 pm
Location: Slovakia

We found out why. It is because GTX 460 (GF 104) has different architecture. Look here:

http://www.anandtech.com/show/3809/nvid ... 200-king/2

One thing we haven’t discussed up until now is how an SM is internally divided up for the purposes of executing instructions. Since the introduction of G80 in 2006, the size of a warp has stayed constant at 32 threads wide. For Fermi, a warp is executed over 2 (or more) clocks of the CUDA cores – 16 threads are processed and then the other 16 threads in that warp are processed. For full SM utilization, all threads must be running the same instruction at the same time. For these reasons a SM is internally divided up in to a number of execution units that a single dispatch unit can dispatch work to:

16 CUDA cores (#1)
16 CUDA cores (#2)
16 Load/Store Units
16 Interpolation SFUs (not on NVIDIA's diagrams)
4 Special Function SFUs
4 Texture Units
With 2 warp scheduler/dispatch unit pairs in each SM, GF100 can utilize at most 2 of 6 execution units at any given time. It’s also because of the SM being divided up like this that it was possible for NVIDIA to add to it. GF104 in comparison has the following:

16 CUDA cores (#1)
16 CUDA cores (#2)
16 CUDA cores (#3)
16 Load/Store Units
16 Interpolation SFUs (not on NVIDIA's diagrams)
8 Special Function SFUs
8 Texture Units
This gives GF104 a total of 7 execution units, the core of which are the 3 blocks of 16 CUDA cores.

picture :)

With 2 warp schedulers, GF100 could put all 32 CUDA cores to use if it had 2 warps where both required the use of CUDA cores. With GF104 this gets more complex since there are now 3 blocks of CUDA cores but still only 2 warp schedulers. So how does NVIDIA feed 3 blocks of CUDA cores with only 2 warp schedulers? They go superscalar.

In a nutshell, superscalar execution is a method of extracting Instruction Level Parallelism from a thread. If the next instruction in a thread is not dependent on the previous instruction, it can be issued to an execution unit for completion at the same time as the instruction preceding it. There are several ways to extract ILP from a workload, with superscalar operation being something that modern CPUs have used as far back as the original Pentium to improve performance. For NVIDIA however this is new – they were previously unable to use ILP and instead focused on Thread Level Parallelism (TLP) to ensure that there were enough warps to keep a GPU occupied.


So, GTX 460 has another 16 CUDA cores in one block but efficiency of theese cores is dependent on character of the application (like intel Hyper-threading in Games is almost useless)
:shock: :shock: :shock:
RESULT: CUDA 3.2 only showing another 112 cores, but only showing, this architecture cannot us all 3 blocks in this version octane, maybe radiance will find out how to do it :)
1 x GTX 460 2GB, Core i3 @ 3,7Ghz, 4GB Ram, Win 7 64-bit, 260.99 WHQL
Tugpsx
Licensed Customer
Posts: 1150
Joined: Thu Feb 04, 2010 8:04 pm
Location: Chicago, IL
Contact:

Wow thanks for the info.
On the topic of cards. Tigerdirect has GTX 460 on sale for $109 USD.
I recently bought a system with GTX480 and am waiting for it to arrive. Can i use the 460 with a 480 in SLI mode.
Win 11 64GB | NVIDIA RTX3060 12GB
Post Reply

Return to “Development Build Releases”