Cuda error 702

Forums: Cuda error 702
Maxon Cinema 4D (Export script developed by abstrax, Integrated Plugin developed by aoktar)

Moderator: aoktar

Cuda error 702

Postby orb101 » Fri Oct 06, 2017 9:40 am

orb101 Fri Oct 06, 2017 9:40 am
Hi there

A machine I use as an octane slave with 4x 1080Ti's is repeatedly giving me the following error in the slave console.

error.png


Everything works fine until it's been rendering, sometimes for a few frames, sometimes for a while. Then I get a 702 and the console freezes.
The scene type and complexity, memory usage etc do not seem to have any effect on the crash. It can be a simple scene or very complex and the issue is the same.
The error is always a 702, but the description is sometimes different. It can be; "failed to copy memory from device" "failed to allocate/deallocate memory" "could not get memory info"
At this point it can also freeze the master (presumably it's waiting for the slave and getting nothing)
Sometimes the slave machine OS becomes unresponsive and throws a BSOD DPC_watchdog violation.

I've tried to isolate the cards and render using each of them separately, I cant get any single card to fail, but any combination of cards will fail.
I have re seated every card, tried again to narrow it down to a failing card or cards and failed.

Running tests like Furmark and vram tests show no errors on the cards, and during stress tests none of the cards have exceeded a heat limit or crashed. Theyre all hybrid watercooled and the temps never go above 50 degrees.

The driver the cards are using is a stable Nvidia driver that I use on another octane machine with no issues.
Bios, firmware, HDD on the slave machine are all latest updates.

Aside from cinema4d and the slave install for octane there's nothing else on the machine except the OS which is Windows 10 Pro.

Machine specs;
X99 E -WS USB 3.1 mobo
4x 1080ti EVGA Hybrid
Samsung 850 Evo M.2 500gb
64GB corsair DDR4 ram
Intel i7 6850k

Any help greatly appreciated.
orb101
Licensed Customer
Licensed Customer
 
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

Re: Cuda error 702

Postby bepeg4d » Fri Oct 06, 2017 9:53 am

bepeg4d Fri Oct 06, 2017 9:53 am
Hi orb101,
CUDA error 702 in general is related to hw or drivers issues.
Have you tried to render with only 3x GPU and leave one GPU for system/monitor only?
ciao beppe
User avatar
bepeg4d
Octane Guru
Octane Guru
 
Posts: 9954
Joined: Wed Jun 02, 2010 6:02 am
Location: Italy

Re: Cuda error 702

Postby orb101 » Fri Oct 06, 2017 9:57 am

orb101 Fri Oct 06, 2017 9:57 am
Hi Beppe

I tried;

4x GPU's fail
3x GPU's fail
2x GPU's fail
1x GPU success

I then tried to isolate the failing card, but found that any combination of two cards or more always failed.

I thought it might be a bad pcie slot, but rotating the cards showed that it could fail no matter which slot's i use.

SLI options are turned off in Nvidia control panel.

Any chance this is to do with system RAM or HDD?
orb101
Licensed Customer
Licensed Customer
 
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

Re: Cuda error 702

Postby bepeg4d » Fri Oct 06, 2017 10:04 am

bepeg4d Fri Oct 06, 2017 10:04 am
Hi orb101,
I suspect more in the motherbord bios and Win 10.
Maybe worths to try with a different hdd and Win 7.
ciao beppe
User avatar
bepeg4d
Octane Guru
Octane Guru
 
Posts: 9954
Joined: Wed Jun 02, 2010 6:02 am
Location: Italy

Re: Cuda error 702

Postby orb101 » Mon Oct 09, 2017 3:46 pm

orb101 Mon Oct 09, 2017 3:46 pm
For those interested, I did a lot more testing with various GPUs, all of which failed when attached to one of the PCIE slots.

It looks like a fault on the motherboard, and it's on it's way back to the manufacturer for further tests.
orb101
Licensed Customer
Licensed Customer
 
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

Re: Cuda error 702

Postby tobydalsgaard » Thu Oct 12, 2017 4:08 pm

tobydalsgaard Thu Oct 12, 2017 4:08 pm
I know this is probably too late for you since you have already shipped it off the mobo:

I had the exact same issues with a second PC that we had purchased, the fix was to downgrade the Nvida drivers. Haven't had a single problem since. Just FYI the version I downgraded to was 382.33 which is what our other PC was running on
tobydalsgaard
Licensed Customer
Licensed Customer
 
Posts: 17
Joined: Mon May 15, 2017 2:00 pm

Re: Cuda error 702

Postby coilbook » Sat Oct 14, 2017 5:28 am

coilbook Sat Oct 14, 2017 5:28 am
same here -> kernel execution failed(report)
CUDA error 702 on device 3: the launch timed out and was terminated
-> failed to launch kernel(ptBrdf2)
device 3: direct light kernel failed
CUDA error 702 on device 0: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 0: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 1: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 1: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 3: the launch timed out
coilbook
Licensed Customer
Licensed Customer
 
Posts: 2985
Joined: Mon Mar 24, 2014 2:27 pm

Re: Cuda error 702

Postby joeycamacho » Sun Oct 15, 2017 11:14 pm

joeycamacho Sun Oct 15, 2017 11:14 pm
tobydalsgaard wrote:I know this is probably too late for you since you have already shipped it off the mobo:

I had the exact same issues with a second PC that we had purchased, the fix was to downgrade the Nvida drivers. Haven't had a single problem since. Just FYI the version I downgraded to was 382.33 which is what our other PC was running on


Thanks - this was helpful. Especially after the Windows Creator's Update. Seemed to have done some strange things to my rig - but this helped nonetheless.
Also, for other viewers - CUDA 702 can sometimes be caused by a Windows TDR setting that will lock up the computer. Read more about it here - https://www.pugetsystems.com/labs/hpc/W ... ience-777/
Hopefully this will help others.
Freelance Motion & Graphic Designer
-----------------------
Cinema 4D R19| Win 10 Home 1803 | 4X GTX TitanX 12GB Maxwell | Intel Core i7 5930K 3.5GHz | 48 GB DDR4 -2133 RAM
joeycamacho
Licensed Customer
Licensed Customer
 
Posts: 42
Joined: Sat May 24, 2014 6:30 pm

Re: Cuda error 702

Postby FAZ » Wed Oct 18, 2017 1:18 pm

FAZ Wed Oct 18, 2017 1:18 pm
tobydalsgaard wrote:I know this is probably too late for you since you have already shipped it off the mobo:

I had the exact same issues with a second PC that we had purchased, the fix was to downgrade the Nvida drivers. Haven't had a single problem since. Just FYI the version I downgraded to was 382.33 which is what our other PC was running on


Just to confirm- Your post saved my life! :)

Going back to that specific driver solved a very similar problem I was having as well. Thank you!
www.alexdimella.com IG @alexdimella
FAZ
Licensed Customer
Licensed Customer
 
Posts: 90
Joined: Tue Oct 06, 2015 5:56 pm
Location: Miami, FL

Re: Cuda error 702

Postby rustyippolito » Fri Dec 15, 2017 8:18 pm

rustyippolito Fri Dec 15, 2017 8:18 pm
I have been having the exact same problems with my 4x 1080Ti rig and i have tried swapping out cards one at a time, switching them around, and whenever i try to render with more than one card enabled, i get intermittent system freezes and even a few BSODs. I tried downgrading from 388.59 down to 382.33 and now none of my GPUs show up at all in the Octane plugin or in the standalone. Has anyone cracked this problem yet? I have had BOXX (It is a maxed-out BOXX APEX4 Workstation) send me out multiple replacement Nvidia cards and RAM and had NO luck. I am about to send the whole damned machine back to them and see if they can find a solution, but even they have said that they are not sure where to diagnose, other than the MB. If anyone has any info that could help us find a direction to go in , we would appreciate it.
rustyippolito
Licensed Customer
Licensed Customer
 
Posts: 88
Joined: Wed Mar 08, 2017 10:07 pm
Next

Return to Maxon Cinema 4D


Who is online

Users browsing this forum: Google [Bot] and 20 guests

Fri Apr 19, 2024 2:42 pm [ UTC ]