Octane crash when running with mixed cards 2080ti & 1080ti

Newtek Lightwave 3D (exporter developed by holocube, Integrated Plugin developed by juanjgon)

Moderator: juanjgon

NemesisCGI
Licensed Customer
Posts: 111
Joined: Sat Apr 06, 2013 3:18 am

Ran into this odd problem. I have a system with 3x 2080ti's and 1x 1080ti on a cluster & 1x 2080ti & 1 x 1080ti internal direct to the motherboard.
When I load a scene that takes around 5gb of GPU ram Octane fails to render in both IPR & F10.

I get the following cuda errors:

00:01:47 (0107.89) | Reading preview image successfully
00:01:47 (0107.89) | [profile] Function "GetPreviewImage" over "" execution time: 0.004 seconds
00:01:48 (0108.67) * OCTANE API MSG: CUDA error 2 on device 2: out of memory
00:01:48 (0108.67) * OCTANE API MSG: CUDA error 2 on device 0: out of memory
00:01:48 (0108.67) * OCTANE API MSG: -> failed to allocate device array
00:01:48 (0108.67) * OCTANE API MSG: -> failed to allocate device array
00:01:48 (0108.67) * OCTANE API MSG: CUDA error 2 on device 1: out of memory
00:01:48 (0108.67) * OCTANE API MSG: -> failed to allocate device array
00:01:48 (0108.67) |
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) | +++ IPR RENDER FAILURE +++ processing the failure callback
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) |
00:01:48 (0108.67) |
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) | +++ IPR RENDER FAILURE +++ processing the failure callback
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) |
00:01:48 (0108.67) |
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) | +++ IPR RENDER FAILURE +++ processing the failure callback
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) |
00:01:48 (0108.67) * OCTANE API MSG: CUDA error 2 on device 4: out of memory
00:01:48 (0108.67) * OCTANE API MSG: -> failed to allocate device array
00:01:48 (0108.67) |
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) | +++ IPR RENDER FAILURE +++ processing the failure callback
00:01:48 (0108.67) | ++++++++++++++++++++++++++
00:01:48 (0108.67) |
00:01:48 (0108.67) | IPR: reset image callback
00:01:48 (0108.67) | IPR: wait for the getPreviewImage function
00:01:48 (0108.78) | IPR: free the OpenGL buffers
00:01:48 (0108.78) | IPR: rendering done
00:01:48 (0108.82) * OCTANE API MSG: CUDA error 2 on device 2: out of memory
00:01:48 (0108.82) * OCTANE API MSG: -> failed to load module
00:01:48 (0108.82) * OCTANE API MSG: device 2: failed to load compiled OSL code:

00:01:48 (0108.82) * OCTANE API MSG: device 2: failed to compile module 0
00:01:48 (0108.82) * OCTANE API MSG: CUDA error 999 on device 4: unknown error
00:01:48 (0108.82) * OCTANE API MSG: -> failed to link to cubin
00:01:48 (0108.82) * OCTANE API MSG: device 4: failed to compile OSL code:

00:01:48 (0108.82) * OCTANE API MSG: device 4: failed to compile module 0
00:01:49 (0109.07) * OCTANE API MSG: CUDA error 2 on device 1: out of memory
00:01:49 (0109.07) * OCTANE API MSG: -> failed to allocate device memory
00:01:49 (0109.07) * OCTANE API MSG: CUDA error 2 on device 0: out of memory
00:01:49 (0109.07) * OCTANE API MSG: -> failed to allocate device memory


All the cards do have around 11gb free, so this shouldn't happen.
Here's where it gets odd, if I disable all the 2080ti's IPR will open just fine, no errors & the scene will also render its 240 frames without any problems. The same goes if I disable the 1080ti's. So it seems mixing cards is somehow causing a cuda crash.
I have tested the same scene with just 1080ti's & just the 2080ti's and the renders completed without issue. I've also exported an orbx file of the same scene & tested the standalone. With all cards running the standalone renders just fine, no crash. The problem seems to only be on the plug-in side.

My system specs are:

AMD Ryzen Threadripper 1950X 16 core CPU 3.4GHz
32 GB System ram.
Win 10 Pro 64 bit OS.
1x 1200wat PSU
1x 1080ti & 1 x 2080ti direct to the MB.
Asus X339 Zenith Extreme motherboard.

Nividia drivers tested 417.22 & 417.71
Octane 4.01 & Octane 2018 current build. Both same failure.

2x 1200wat PSU to power Amfeltec cluster & cards.
Amfeltec GPU clusters with 3x 2080ti & 1x 1080ti.
Win 7 pro 64bit GTX780
mikefrisk
Licensed Customer
Posts: 172
Joined: Fri Aug 29, 2014 9:17 pm

I don't know if this will help your situation but I had a similar problem and it turned out to be my logitech keyboard/mouse drivers causing DPC latency. Try disabling/uninstalling all unnecessary drivers.
NemesisCGI
Licensed Customer
Posts: 111
Joined: Sat Apr 06, 2013 3:18 am

mikefrisk wrote:I don't know if this will help your situation but I had a similar problem and it turned out to be my logitech keyboard/mouse drivers causing DPC latency. Try disabling/uninstalling all unnecessary drivers.
I don't have a logitech mouse driver. :)
Win 7 pro 64bit GTX780
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

what version are You running? Error seems to be very similar to one that has been solved already, so if You are not on the latest I would encourage to try those.
NemesisCGI
Licensed Customer
Posts: 111
Joined: Sat Apr 06, 2013 3:18 am

glimpse wrote:what version are You running? Error seems to be very similar to one that has been solved already, so if You are not on the latest I would encourage to try those.
I'm running 4.02 of the Lightwave plug-in now. Still the same problem. Runs out of GPU ram even though there's + 4gb free on each card after scene loads.
Win 7 pro 64bit GTX780
NemesisCGI
Licensed Customer
Posts: 111
Joined: Sat Apr 06, 2013 3:18 am

glimpse wrote:what version are You running? Error seems to be very similar to one that has been solved already, so if You are not on the latest I would encourage to try those.
Just re-tested v2018.01 RC3 same problem.
Win 7 pro 64bit GTX780
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

Could You try to switch off the cluster and try only on mixed cards that have enough bandwidth (directly seated in Your motherboard)?.
Octane is pretty stable with 417.71 (and I would avoid the the latest one)

Is that happening on any heavier scene? or only in specific one?
interesting, 'cos as You mention before I haven't noticed any issue mixing cards in standalone.
User avatar
juanjgon
Octane Plugin Developer
Posts: 8867
Joined: Tue Jan 19, 2010 12:01 pm
Location: Spain

Yes, if this problem happens only in the LightWave plugin perhaps it is related to the code used to get the Octane buffers for the IPR or the rendering preview window, but it is really a weird issue that in any case could be related to a drivers issue or to a weird bug in the Octane core, not reproducible in the Standalone.

Could be great to know if the same problem happens with other plugins in this same system, or if it is only a LW issue ...

Thanks,
-Juanjo
NemesisCGI
Licensed Customer
Posts: 111
Joined: Sat Apr 06, 2013 3:18 am

glimpse wrote:Could You try to switch off the cluster and try only on mixed cards that have enough bandwidth (directly seated in Your motherboard)?.
Octane is pretty stable with 417.71 (and I would avoid the the latest one)

Is that happening on any heavier scene? or only in specific one?
interesting, 'cos as You mention before I haven't noticed any issue mixing cards in standalone.
Okay when running the system with two cards (Internal) it runs fine.

I've gone through all the cards keeping one internal card on, no problems with just 2 cards all render no errors.

It looks like the error happens when there's more than four cards & the scene is bigger the 5gb+

As I said above, the standalone renders on all 6 cards without any problems. Before loading the scene to the GPUs all cards report having around 10gb-11gb free. Even when the scene is loaded (and crashes) there is still reported space on the GPUs, none show a maxed out ram usage.
Win 7 pro 64bit GTX780
NemesisCGI
Licensed Customer
Posts: 111
Joined: Sat Apr 06, 2013 3:18 am

It looks like the max cards I can render with is 4 when the scene is greater than 5gb. Note that the 6 cards work great if the scene is smaller than 5gb.
I just don't get why the error is out of memory when there is enough, something is making a heavy demand when it shouldn't...

I trust you guys will find the source of the problem at some stage. For now I can cope with running just the four 2080ti cards on that system. Work is heavy though. I have lots of projects to complete in the next few days, so testing time has been hard to find. Keep in mind this takes forever to test. The scene is a five min load, the load to GPU is slow too. After a crash, it is a reboot, as I can't even use the system fully.
Win 7 pro 64bit GTX780
Post Reply

Return to “Lightwave 3D”