Page 1 of 1

how to solve a Cuda error 999?

Posted: Wed Nov 13, 2019 7:17 pm
by Hatsize7
I hope someone can help me I'm struggling over this error for some time now.

How can we solve a CUDA error 999: failed to create link error?
Sometimes the message is "failed to allocate device array.

I get this error message when I try to run Octanebench. (4.00c) And obviously this render slave doesn't work when I try to render.

Long story short:
The system is Windows 10, motherboard is ASrock H110 Pro BTC+ (13 PCI-E slots)
32Gb of Ram, SSD disk
5 GPUs (3 GTX 1080, 2 GTX 1070Ti)
Nvidia driver currently running is 419.17

The system was working fine for months. Now lately it stops during rendering, and it is a real pain, because the system restarts and the master is still waiting for the frame to finish (usually around 98% it hangs). I realized that even the Octanebench wouldn't run when this happens.

I updated Windows to 1903. I tried different Nvidia Studio drivers, cleaning them with DDU.
I tried the MSI_util_v2 suggested by others for the Watchdog BSOD problem (since it is a slave, I don't know if there is a BSOD, but it happend before when a monitor was plugged in). In the event viewer I see a critical error 41 (Kernel-Power), when this happens. I changed the virtual memory to 100Gb. I removed the audio driver. (all these were solutions for other similar CUDA errors)
I tried the cards one by one, Octanebench works with each when only one is installed. They work until 3 are plugged in. I started getting the CUDA error message when the 4th card was plugged in. Then I tried different PCI-E slots for the cards and I come to a solution when it all worked, and the Octanebench ran smoothly again. I rendered for a few hours, then the machine restarted again, and now I get the CUDA error 999 again while running Octanbench.

PSU is not a problem, it is a new 1500W and the cards never took more than 130W power each. No overlocking.

I'm thinking about trying Windows 7 to see if it works.
Can anybody help me and shed some light on this?
I have another slave with the same setup with 6 GPUs (1080s) running smoothly since the beginning of the year, somehow this machine is acting up in the past few days. What can I do?
Thank you in advance!

Re: how to solve a Cuda error 999?

Posted: Thu Nov 14, 2019 8:48 am
by paride4331
Hello Hatsize7,
try installing this Nvidia drivers:
https://www.nvidia.com/Download/driverR ... 3217/en-us
Regards
Paride

Re: how to solve a Cuda error 999?

Posted: Fri Nov 15, 2019 10:48 am
by Hatsize7
Hello Paride,

I found another solution in the dark catacombs of the forum from Raphael and it seems like it worked.

"**MAKE SURE YOU ACTIVATE THE "OPTIMIZE FOR COMPUTE" IN THE DRIVERS SETTINGS IN THE NVIDIA CONTROL PANEL** (recent drivers only and win10 <= not sure)

also, you want to disable *all* the nvidia hdmi sound devices in the device editor as it causes the watchdog to trigger sometime on big renders, hence restarting the drivers & crashing the render."

It seems this helped. The machine is running 100% in the past 24 hours rendering an animation, with 5 cards, and it didn't run into any issues yet.

Thanks for your response anyway, that's going to be my next try if something goes wrong.