How will CUDA 4 affect Octane's software architecture?
Sincerely
Rick
CUDA 4.0
Forum rules
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
There's a small article about it here. I was wondering as well whether this will give the Octane developers a headache like 3.2 did.
http://www.anandtech.com/show/4198/nvid ... es-cuda-40
http://www.anandtech.com/show/4198/nvid ... es-cuda-40
Win 7 64bit, GF 460 2GB, Intel quad, 4GB memory
I don't know yet, but we will see, when it's out. I'm not so much interested in the UMA (too slow for our purposes), but I'm really looking forward to a hopefully new/improved compiler tool chain, which was giving us quite some headaches in the past.
But let's wait and see
Cheers,
Marcus
But let's wait and see

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
- Jaberwocky
- Posts: 976
- Joined: Tue Sep 07, 2010 3:03 pm
looking at it in more detail.It looks like you will be able to pool the cards memory.EG fit the scene across multiple cards.
Now the memory across multi cards in a system will become addative if i am reading it right.
EG 2 x 2mb cards would give you 4Mb to play with.
now that would be a bit og a game changer.

Now the memory across multi cards in a system will become addative if i am reading it right.
EG 2 x 2mb cards would give you 4Mb to play with.
now that would be a bit og a game changer.

CPU:-AMD 1055T 6 core, Motherboard:-Gigabyte 990FXA-UD3 AM3+, Gigabyte GTX 460-1GB, RAM:-8GB Kingston hyper X Genesis DDR3 1600Mhz D/Ch, Hard Disk:-500GB samsung F3 , OS:-Win7 64bit
A CUDA Release Candidate 4.0 is available now.
http://developer.nvidia.com/object/cuda ... loads.html
Looks like its got plenty of low level changes. I would think that it's going to be a bit of a wait before a future release of Octane can be fully migrated. That last Cuda version 3.2 seemed to delay updates a bit, and this 4.0 is still only a release candidate.
http://developer.nvidia.com/object/cuda ... loads.html
Looks like its got plenty of low level changes. I would think that it's going to be a bit of a wait before a future release of Octane can be fully migrated. That last Cuda version 3.2 seemed to delay updates a bit, and this 4.0 is still only a release candidate.
Win7 x64 - i7 920 - 6GB RAM - GTX470 - Blender - 3DCoat - Octane.
Unfortunately, the devil lies in the detailJaberwocky wrote:looking at it in more detail.It looks like you will be able to pool the cards memory.EG fit the scene across multiple cards.
Now the memory across multi cards in a system will become addative if i am reading it right.
EG 2 x 2mb cards would give you 4Mb to play with.
now that would be a bit og a game changer.

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Actually, the changes were a lot smaller than you would expect from the PowerPoints NVIDIA has floated around before. -> Octane builds and runs fine with CUDA 4.0. And no big surprises regarding speed. Unfortunately the multi-GPU changes are more trivial than what I was hoping for after reading the PowerPoints, which probably means that the multi-GPU rewrite will go on as planned originally.Qtoken wrote:A CUDA Release Candidate 4.0 is available now.
http://developer.nvidia.com/object/cuda ... loads.html
Looks like its got plenty of low level changes. I would think that it's going to be a bit of a wait before a future release of Octane can be fully migrated. That last Cuda version 3.2 seemed to delay updates a bit, and this 4.0 is still only a release candidate.
Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
- Jaberwocky
- Posts: 976
- Joined: Tue Sep 07, 2010 3:03 pm
abstrax wrote:Unfortunately, the devil lies in the detailJaberwocky wrote:looking at it in more detail.It looks like you will be able to pool the cards memory.EG fit the scene across multiple cards.
Now the memory across multi cards in a system will become addative if i am reading it right.
EG 2 x 2mb cards would give you 4Mb to play with.
now that would be a bit og a game changer.
Each GPU needs access to everything. If you would distribute the scene data over several GPUs or even the CPU, you would then have to fetch the data from the other GPUs or the CPU. And everything via PCI ... That's superslow and not practical for our uses.
Cheers,
Marcus
you mean even over PCIE x16 V2.0 slots

Perhaps we need to wait for PCIE V3.0 slots.
http://www.eetimes.com/electronics-news ... cification.
I suppose then of course there would be a backward compatability issue.

CPU:-AMD 1055T 6 core, Motherboard:-Gigabyte 990FXA-UD3 AM3+, Gigabyte GTX 460-1GB, RAM:-8GB Kingston hyper X Genesis DDR3 1600Mhz D/Ch, Hard Disk:-500GB samsung F3 , OS:-Win7 64bit
No external bus will be able to help here. Bandwith is not the problem, but latency. Any memory that is accessed randomly and used in your inner loops of your algorithms needs to be fetched as quickly as possible. Usually you don't load heaps of data - only a few bytes - but you have to wait for them, i.e. your core is basically twiddling thumbs during that time. Caches reduce the problem, but in the end light can travel only so far during one clock cycle (a few centimeters only), which means you want to have your memory physically as close as possible. And you achieve that only with on-board memory (which is already slow compared to caches).Jaberwocky wrote: you mean even over PCIE x16 V2.0 slots![]()
Perhaps we need to wait for PCIE V3.0 slots.
http://www.eetimes.com/electronics-news ... cification.
I suppose then of course there would be a backward compatability issue.
Fortunately there is help coming from another direction: It looks like the amount of VRAM is increasing continuously. The GTX 580 can already be bought with 3GB

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
- Jaberwocky
- Posts: 976
- Joined: Tue Sep 07, 2010 3:03 pm
Ok thanks for the insight Abstrax.
CPU:-AMD 1055T 6 core, Motherboard:-Gigabyte 990FXA-UD3 AM3+, Gigabyte GTX 460-1GB, RAM:-8GB Kingston hyper X Genesis DDR3 1600Mhz D/Ch, Hard Disk:-500GB samsung F3 , OS:-Win7 64bit