how to solve a Cuda error 999?
Posted: Wed Nov 13, 2019 7:17 pm
I hope someone can help me I'm struggling over this error for some time now.
How can we solve a CUDA error 999: failed to create link error?
Sometimes the message is "failed to allocate device array.
I get this error message when I try to run Octanebench. (4.00c) And obviously this render slave doesn't work when I try to render.
Long story short:
The system is Windows 10, motherboard is ASrock H110 Pro BTC+ (13 PCI-E slots)
32Gb of Ram, SSD disk
5 GPUs (3 GTX 1080, 2 GTX 1070Ti)
Nvidia driver currently running is 419.17
The system was working fine for months. Now lately it stops during rendering, and it is a real pain, because the system restarts and the master is still waiting for the frame to finish (usually around 98% it hangs). I realized that even the Octanebench wouldn't run when this happens.
I updated Windows to 1903. I tried different Nvidia Studio drivers, cleaning them with DDU.
I tried the MSI_util_v2 suggested by others for the Watchdog BSOD problem (since it is a slave, I don't know if there is a BSOD, but it happend before when a monitor was plugged in). In the event viewer I see a critical error 41 (Kernel-Power), when this happens. I changed the virtual memory to 100Gb. I removed the audio driver. (all these were solutions for other similar CUDA errors)
I tried the cards one by one, Octanebench works with each when only one is installed. They work until 3 are plugged in. I started getting the CUDA error message when the 4th card was plugged in. Then I tried different PCI-E slots for the cards and I come to a solution when it all worked, and the Octanebench ran smoothly again. I rendered for a few hours, then the machine restarted again, and now I get the CUDA error 999 again while running Octanbench.
PSU is not a problem, it is a new 1500W and the cards never took more than 130W power each. No overlocking.
I'm thinking about trying Windows 7 to see if it works.
Can anybody help me and shed some light on this?
I have another slave with the same setup with 6 GPUs (1080s) running smoothly since the beginning of the year, somehow this machine is acting up in the past few days. What can I do?
Thank you in advance!
How can we solve a CUDA error 999: failed to create link error?
Sometimes the message is "failed to allocate device array.
I get this error message when I try to run Octanebench. (4.00c) And obviously this render slave doesn't work when I try to render.
Long story short:
The system is Windows 10, motherboard is ASrock H110 Pro BTC+ (13 PCI-E slots)
32Gb of Ram, SSD disk
5 GPUs (3 GTX 1080, 2 GTX 1070Ti)
Nvidia driver currently running is 419.17
The system was working fine for months. Now lately it stops during rendering, and it is a real pain, because the system restarts and the master is still waiting for the frame to finish (usually around 98% it hangs). I realized that even the Octanebench wouldn't run when this happens.
I updated Windows to 1903. I tried different Nvidia Studio drivers, cleaning them with DDU.
I tried the MSI_util_v2 suggested by others for the Watchdog BSOD problem (since it is a slave, I don't know if there is a BSOD, but it happend before when a monitor was plugged in). In the event viewer I see a critical error 41 (Kernel-Power), when this happens. I changed the virtual memory to 100Gb. I removed the audio driver. (all these were solutions for other similar CUDA errors)
I tried the cards one by one, Octanebench works with each when only one is installed. They work until 3 are plugged in. I started getting the CUDA error message when the 4th card was plugged in. Then I tried different PCI-E slots for the cards and I come to a solution when it all worked, and the Octanebench ran smoothly again. I rendered for a few hours, then the machine restarted again, and now I get the CUDA error 999 again while running Octanbench.
PSU is not a problem, it is a new 1500W and the cards never took more than 130W power each. No overlocking.
I'm thinking about trying Windows 7 to see if it works.
Can anybody help me and shed some light on this?
I have another slave with the same setup with 6 GPUs (1080s) running smoothly since the beginning of the year, somehow this machine is acting up in the past few days. What can I do?
Thank you in advance!