Page 1 of 2

CUDA Error

Posted: Tue Apr 17, 2018 7:13 pm
by pryzm
Anyone seen this message in the log file before?

00:00:14 (0014.44) * OCTANE API MSG: CUDA error 999 on device 3: unknown error
00:00:14 (0014.44) * OCTANE API MSG: -> failed to bind device to current thread
00:00:14 (0014.44) * OCTANE API MSG: device 3: failed to initialize resources
00:00:14 (0014.44) |
00:00:14 (0014.44) | ++++++++++++++++++++++++++
00:00:14 (0014.44) | +++ IPR RENDER FAILURE +++ processing the failure callback
00:00:14 (0014.44) | ++++++++++++++++++++++++++
00:00:14 (0014.44) |
00:00:14 (0014.70) | ... second update
00:00:14 (0014.70) | ... isRenderingPaused
00:00:14 (0014.70) | Render setup Ok.

It just started happening. I was playing with the hardware to try to get things cooler (using a PCIe extender) and now when I load this particular LW scene (it has TFD in it) it gives me this error and won't render.

Re: CUDA Error

Posted: Tue Apr 17, 2018 8:48 pm
by juanjgon
Hi,

Hmm, this crash sounds to me like a problem with these extenders. One thing that you can try if you are using GPU extenders of external boxes is to disable the tone mapping for all these GPUs, leaving this option enabled only for the GPUs attached directly to the main board. The "use for tone map" option is available in the same panel used to configure the GPUs.

Thanks,
-Juanjo

Re: CUDA Error

Posted: Tue Apr 17, 2018 8:52 pm
by mikefrisk
You could try increasing your drivers timeout...

Open the registry and go to "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers"

Edit the key "TdrDelay". I think it's default value is 10, try setting it higher to something like 60 or 90.

Basically this will give your graphics drivers 60 or 90 seconds to recover rather than 10. I believe heavy scenes take a long time to transfer in and out of vram and 10 seconds simply isnt enough time for the process to happen, so your computer assumes it has crashed and will restart your drivers.

Re: CUDA Error

Posted: Tue Apr 17, 2018 9:07 pm
by pryzm
Awesome, thank you both. I'll try both suggestions. I haven't had any previous problems with any scenes (I should have left well enough alone) but the cards were getting too hot (and throttling) and was hoping to create some relief. I have removed the extender and put all the cards back direct and still have the issue. One card will drop without any response (with drivers crash) and I have to reboot the machine or re-install drivers to get it back. I removed that card and it happened to a second one in a different slot. With that revelation, I'm hopeful it's not a hardware issue.

Re: CUDA Error

Posted: Tue Apr 17, 2018 10:52 pm
by pryzm
no go on either... :cry:
I get a message that says the driver crashed but successfully recovered (same as before) and then drops one card from those available, even to the OS. Restart or re-install gets it back.

Re: CUDA Error

Posted: Wed Apr 18, 2018 12:35 am
by pryzm
Strange thing here. When I load the VDB version of the TFD solve, there is no such error when rendering...It doesn't show up in IPR, but the IPR window works and renders as normal. I just don't know how to get VDB's to render yet :)

I'm wondering if there is a memory issue (is 6GB's enough for this TFD calculation?)...

Re: CUDA Error

Posted: Wed Apr 18, 2018 2:32 pm
by mikefrisk
I've had crashes where one card is dropped before and I couldn't ever figure out why.

This issue ONLY occurs when rendering TFD?

Re: CUDA Error

Posted: Wed Apr 18, 2018 4:25 pm
by pryzm
seems so. I load up other projects I've been working on and I don't recall it crashing at all.

Re: CUDA Error

Posted: Wed Apr 18, 2018 9:01 pm
by mikefrisk
Check your driver version as well as any updates to TFD.

Re: CUDA Error

Posted: Wed Apr 18, 2018 11:50 pm
by pryzm
Thanks for the suggestion. I've got both drivers and the latest update for TFD.