Yes that all sounds correct to me, except: A GTX 460 is still heaps (more than 2x) faster than a GTX 260. I have got one GTX 260 at home and on the chess scene I get ~4.15 Msamples/s while on the GTX 460 I get ~8.75 Msamples/sJaberwocky wrote:to samCameron
Further to my post above.
This is a quote from the Annandtech site i linked to regarding testing of the GF104 - Fermi chip as used in the GTX460
"Due to the fact that NVIDIA added an additional block of CUDA cores to an SM without adding another warp scheduler, the resulting superscalar design requires that the card extract ILP from the warps in order to simultaneously utilize all 3 blocks of CUDA cores.
As a result the range of best case to worst case scenarios is wider on GF104 than it is GF100: while GF100 could virtually always keep 2 warps going and reach peak utilization, GF104 can only reach peak utilization when at least 1 of the warps has an ILP-safe instruction waiting to go, otherwise the 3rd block of CUDA cores is effectively stalled and a GTX 460 performs more like a 224 CUDA core part. Conversely with a total of 4 dispatch units GF104 is capable of exceeding GF100’s efficiency by utilizing 4 of 7 execution blocks in an SM instead of 2 of 6.
Or in other words, GF104 has the possibility of being more or less efficient than GF100."
This then may be a possible explination on why the GTX460 - using the GF104 GPU design ,although appearing to use all the cores is still no faster using Cuda 3.2 and is in fact only as fast an old GTX260/280 card and a lot slower than the GTX 465/470 cards which used the older GF100 GPU.
Perhaps someone at Refractive could comment.
In general there seems to be a misconception regarding cores and stuff: Usually it's not possible to keep all cores busy all the time. Everytime there is a branch in the code, it is possible, that part of the shaders processors (/cores) used for one block have to wait. The problem with the GTX 460 is, that the block size is larger and therefor a larger number of cores potentially gets stalled. I guess, this what we see here. BUT as I said before: GTX 460 is still a lot better than a GTX 260.
Cheers,
Marcus