Constant Cuda errors on Slave - can't fix it.

Maxon Cinema 4D (Export script developed by abstrax, Integrated Plugin developed by aoktar)

Moderators: ChrisHekman, aoktar

Post Reply
sdanaher
Licensed Customer
Posts: 332
Joined: Mon Apr 06, 2015 12:47 pm
Contact:

My networked salve machine has developed a critical error that I am failing to fix. I really need some help.

Its a MacPro 5,1 running Win7, 12GB RAM, GTX 980 and a Titan X. I'm Currently running R20/V4 for evaluation, usually its R19/3.08.

nVidia Driver: 397.64
CUDA version: 9.1.85

Everything is stock, no overclocking.

This rig has been solid for a number of years, I've never had a problem with Cuda errors. This is suddenly what I get now:
CUDA error 719 on device 1: unspecified launch failure
-> could not get memory info
CUDA error 719 on device 1: unspecified launch failure
-> failed to destroy CUDA event
CUDA error 719 on device 1: unspecified launch failure
-> failed to deallocate pinned memory
CUDA error 719 on device 1: unspecified launch failure
-> could not get memory info
CUDA error 719 on device 1: unspecified launch failure
-> failed to destroy CUDA event
Started logging on 05.12.18 15:25:12

OctaneRender 4.00 (4000021)

CUDA error 700 on device 0: an illegal memory access was encountered
CUDA error 700 on device 1: an illegal memory access was encountered
-> failed to download symbol(stats data)
-> failed to wait for event
device 0: path tracing kernel failed
device 1: path tracing kernel failed
CUDA error 700 on device 1: an illegal memory access was encountered
CUDA error 700 on device 0: an illegal memory access was encountered
-> failed to load symbol data to the device(deep_data)
-> failed to load symbol data to the device(deep_data)
device 1: failed to upload the deep params
device 0: failed to upload the deep params
detected that all GPUs of the slave have failed -> restarting slave
CUDA error 700 on device 0: an illegal memory access was encountered
-> failed to destroy CUDA event
I'm also seeing corruption on the connected display, and other weirdness. I did get it working again by uninstalling and reinstalling Cuda and drivers (via more current versions which also failed, so I rolled back to nVidia Driver: 397.64 and CUDA version: 9.1.85).

I did a stress test today scene and it ran no problem for an hour.

BUT then I tried the scene on which I first had the errors appear and again that toasted the rig. Can a scene cause damage on a slave machine???

This scene was setup with Redshift, but I had removed all the specific tags and settings and uninstalled Redshift. IDK if that has anything to do with it. But the stress test scene ran fine immediately prior to that.

Really struggling to fix this and more importantly understand what the cause is. I hope its not a dying card...
Windows 10 - 64GB RAM - Cinema 4D R20 - RTX 2070 x3
orb101
Licensed Customer
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

Had similar issues and some of the same errors in slave windows.

Has happened to me across multiple slaves / different hardware.

Drivers seem to be a common factor in my experience. I run the oldest and most stable driver that the current octane version allows.

Make sure you install using "clean install" option. Failing that, i'd wipe the system and go for a clean install. Hopefully not much on your slave other than octane slave and OS.

- corruption on the display has never been a common issue for me. that is a potential flag for a busted card. Try the other steps and maybe remove that card from the system, then re-try.
sdanaher
Licensed Customer
Posts: 332
Joined: Mon Apr 06, 2015 12:47 pm
Contact:

orb101 wrote:Had similar issues and some of the same errors in slave windows.

Has happened to me across multiple slaves / different hardware.

Drivers seem to be a common factor in my experience. I run the oldest and most stable driver that the current octane version allows.

Make sure you install using "clean install" option. Failing that, i'd wipe the system and go for a clean install. Hopefully not much on your slave other than octane slave and OS.

- corruption on the display has never been a common issue for me. that is a potential flag for a busted card. Try the other steps and maybe remove that card from the system, then re-try.
Thx for the reply. Yeah, I was considering a wipe and install of the OS, I'll try a clean install of the driver first. Is there a way to test a card to determine if its working properly?
Windows 10 - 64GB RAM - Cinema 4D R20 - RTX 2070 x3
sdanaher
Licensed Customer
Posts: 332
Joined: Mon Apr 06, 2015 12:47 pm
Contact:

OH and do you know what the minimum CUDA install options are - there's a ton of stuff it wants to install and you can't uninstall it in one go you have to do each part one by one.
Windows 10 - 64GB RAM - Cinema 4D R20 - RTX 2070 x3
orb101
Licensed Customer
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

to test a card? Sure.

Just remove all the other cards and use that one on a render stress test for a few hours, see if it fails.
Post Reply

Return to “Maxon Cinema 4D”