CUDA 4.0

Generic forum to discuss Octane Render, post ideas and suggest improvements.
Forum rules
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
Post Reply
reberts2
Licensed Customer
Posts: 105
Joined: Tue Feb 23, 2010 2:48 pm

How will CUDA 4 affect Octane's software architecture?
Sincerely
Rick
Win 7 64 | GTX260 x 2 | Core i7 860 | 4GB & OSX 10.6 | 8600M GT | Core 2 Duo | 4GB
User avatar
Skeletor
Licensed Customer
Posts: 83
Joined: Fri Aug 06, 2010 7:43 pm
Contact:

There's a small article about it here. I was wondering as well whether this will give the Octane developers a headache like 3.2 did.
http://www.anandtech.com/show/4198/nvid ... es-cuda-40
Win 7 64bit, GF 460 2GB, Intel quad, 4GB memory
User avatar
abstrax
OctaneRender Team
Posts: 5506
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

I don't know yet, but we will see, when it's out. I'm not so much interested in the UMA (too slow for our purposes), but I'm really looking forward to a hopefully new/improved compiler tool chain, which was giving us quite some headaches in the past.

But let's wait and see :)

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
Jaberwocky
Licensed Customer
Posts: 976
Joined: Tue Sep 07, 2010 3:03 pm

looking at it in more detail.It looks like you will be able to pool the cards memory.EG fit the scene across multiple cards.

Now the memory across multi cards in a system will become addative if i am reading it right.

EG 2 x 2mb cards would give you 4Mb to play with.

now that would be a bit og a game changer.

:o
CPU:-AMD 1055T 6 core, Motherboard:-Gigabyte 990FXA-UD3 AM3+, Gigabyte GTX 460-1GB, RAM:-8GB Kingston hyper X Genesis DDR3 1600Mhz D/Ch, Hard Disk:-500GB samsung F3 , OS:-Win7 64bit
Qtoken
Licensed Customer
Posts: 40
Joined: Fri Oct 08, 2010 2:43 pm
Location: Sault Ste Marie, ON, Canada

A CUDA Release Candidate 4.0 is available now.
http://developer.nvidia.com/object/cuda ... loads.html

Looks like its got plenty of low level changes. I would think that it's going to be a bit of a wait before a future release of Octane can be fully migrated. That last Cuda version 3.2 seemed to delay updates a bit, and this 4.0 is still only a release candidate.
Win7 x64 - i7 920 - 6GB RAM - GTX470 - Blender - 3DCoat - Octane.
User avatar
abstrax
OctaneRender Team
Posts: 5506
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

Jaberwocky wrote:looking at it in more detail.It looks like you will be able to pool the cards memory.EG fit the scene across multiple cards.

Now the memory across multi cards in a system will become addative if i am reading it right.

EG 2 x 2mb cards would give you 4Mb to play with.

now that would be a bit og a game changer.

:o
Unfortunately, the devil lies in the detail ;) Each GPU needs access to everything. If you would distribute the scene data over several GPUs or even the CPU, you would then have to fetch the data from the other GPUs or the CPU. And everything via PCI ... That's superslow and not practical for our uses.

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
abstrax
OctaneRender Team
Posts: 5506
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

Qtoken wrote:A CUDA Release Candidate 4.0 is available now.
http://developer.nvidia.com/object/cuda ... loads.html

Looks like its got plenty of low level changes. I would think that it's going to be a bit of a wait before a future release of Octane can be fully migrated. That last Cuda version 3.2 seemed to delay updates a bit, and this 4.0 is still only a release candidate.
Actually, the changes were a lot smaller than you would expect from the PowerPoints NVIDIA has floated around before. -> Octane builds and runs fine with CUDA 4.0. And no big surprises regarding speed. Unfortunately the multi-GPU changes are more trivial than what I was hoping for after reading the PowerPoints, which probably means that the multi-GPU rewrite will go on as planned originally.

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
Jaberwocky
Licensed Customer
Posts: 976
Joined: Tue Sep 07, 2010 3:03 pm

abstrax wrote:
Jaberwocky wrote:looking at it in more detail.It looks like you will be able to pool the cards memory.EG fit the scene across multiple cards.

Now the memory across multi cards in a system will become addative if i am reading it right.

EG 2 x 2mb cards would give you 4Mb to play with.

now that would be a bit og a game changer.

:o
Unfortunately, the devil lies in the detail ;) Each GPU needs access to everything. If you would distribute the scene data over several GPUs or even the CPU, you would then have to fetch the data from the other GPUs or the CPU. And everything via PCI ... That's superslow and not practical for our uses.

Cheers,
Marcus

you mean even over PCIE x16 V2.0 slots :o

Perhaps we need to wait for PCIE V3.0 slots.

http://www.eetimes.com/electronics-news ... cification.

I suppose then of course there would be a backward compatability issue. :geek:
CPU:-AMD 1055T 6 core, Motherboard:-Gigabyte 990FXA-UD3 AM3+, Gigabyte GTX 460-1GB, RAM:-8GB Kingston hyper X Genesis DDR3 1600Mhz D/Ch, Hard Disk:-500GB samsung F3 , OS:-Win7 64bit
User avatar
abstrax
OctaneRender Team
Posts: 5506
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

Jaberwocky wrote: you mean even over PCIE x16 V2.0 slots :o

Perhaps we need to wait for PCIE V3.0 slots.

http://www.eetimes.com/electronics-news ... cification.

I suppose then of course there would be a backward compatability issue. :geek:
No external bus will be able to help here. Bandwith is not the problem, but latency. Any memory that is accessed randomly and used in your inner loops of your algorithms needs to be fetched as quickly as possible. Usually you don't load heaps of data - only a few bytes - but you have to wait for them, i.e. your core is basically twiddling thumbs during that time. Caches reduce the problem, but in the end light can travel only so far during one clock cycle (a few centimeters only), which means you want to have your memory physically as close as possible. And you achieve that only with on-board memory (which is already slow compared to caches).

Fortunately there is help coming from another direction: It looks like the amount of VRAM is increasing continuously. The GTX 580 can already be bought with 3GB ;)

Cheers,
Marcus
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
Jaberwocky
Licensed Customer
Posts: 976
Joined: Tue Sep 07, 2010 3:03 pm

Ok thanks for the insight Abstrax.
CPU:-AMD 1055T 6 core, Motherboard:-Gigabyte 990FXA-UD3 AM3+, Gigabyte GTX 460-1GB, RAM:-8GB Kingston hyper X Genesis DDR3 1600Mhz D/Ch, Hard Disk:-500GB samsung F3 , OS:-Win7 64bit
Post Reply

Return to “General Discussion”