Quadro/Tesla better for Octane than geforce?
Posted: Fri May 20, 2011 10:52 pm
I just want to ask (out of curiousity), if the Geforce´s limited/crippled dual precision computing performance compared to Teslas/quadros does have influence on Octane performance? I mean high-end geforces are capped AFAIK at 1/8 of dual precision power of Tesla, does this mean that Tesla with same number of CUDA cores as Geforce is 8x faster? Or this dual precision thing is irrelevant for Octane?
Another question, with CUDA 4.0 there seems to be some new feature, which allows copy/move content from memory of one core to the memory of another core (on dual GPU cards via the NF200 bridge), if understood that correctly? Will it affect Octane in any form/ improve its performance?
EDIT: I suppose i was not clear enough on the second question, so this is, what i was talking about:
CUDA 4.0 removes this burden and gives developers a unified memory base to work with. Sanford Russell, director of CUDA marketing, said that this new capability allows for “peer-to-peer memory access within the node.
"If you have two GPUs in a node, the way this worked prior to 4.0, you literally had to copy the objects to main memory through the CPU, then copy it back out and put it on GPU No. 2. There were a whole bunch of extra steps required. We now have a peer-to-peer capability, where it literally is a copy from memory to memory. It goes across the PCIX bus, and it’s no longer going to system memory."
Another question, with CUDA 4.0 there seems to be some new feature, which allows copy/move content from memory of one core to the memory of another core (on dual GPU cards via the NF200 bridge), if understood that correctly? Will it affect Octane in any form/ improve its performance?
EDIT: I suppose i was not clear enough on the second question, so this is, what i was talking about:
CUDA 4.0 removes this burden and gives developers a unified memory base to work with. Sanford Russell, director of CUDA marketing, said that this new capability allows for “peer-to-peer memory access within the node.
"If you have two GPUs in a node, the way this worked prior to 4.0, you literally had to copy the objects to main memory through the CPU, then copy it back out and put it on GPU No. 2. There were a whole bunch of extra steps required. We now have a peer-to-peer capability, where it literally is a copy from memory to memory. It goes across the PCIX bus, and it’s no longer going to system memory."