Cuda error 702

Maxon Cinema 4D (Export script developed by abstrax, Integrated Plugin developed by aoktar)

Moderators: ChrisHekman, aoktar

orb101
Licensed Customer
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

Hi there

A machine I use as an octane slave with 4x 1080Ti's is repeatedly giving me the following error in the slave console.
error.png
Everything works fine until it's been rendering, sometimes for a few frames, sometimes for a while. Then I get a 702 and the console freezes.
The scene type and complexity, memory usage etc do not seem to have any effect on the crash. It can be a simple scene or very complex and the issue is the same.
The error is always a 702, but the description is sometimes different. It can be; "failed to copy memory from device" "failed to allocate/deallocate memory" "could not get memory info"
At this point it can also freeze the master (presumably it's waiting for the slave and getting nothing)
Sometimes the slave machine OS becomes unresponsive and throws a BSOD DPC_watchdog violation.

I've tried to isolate the cards and render using each of them separately, I cant get any single card to fail, but any combination of cards will fail.
I have re seated every card, tried again to narrow it down to a failing card or cards and failed.

Running tests like Furmark and vram tests show no errors on the cards, and during stress tests none of the cards have exceeded a heat limit or crashed. Theyre all hybrid watercooled and the temps never go above 50 degrees.

The driver the cards are using is a stable Nvidia driver that I use on another octane machine with no issues.
Bios, firmware, HDD on the slave machine are all latest updates.

Aside from cinema4d and the slave install for octane there's nothing else on the machine except the OS which is Windows 10 Pro.

Machine specs;
X99 E -WS USB 3.1 mobo
4x 1080ti EVGA Hybrid
Samsung 850 Evo M.2 500gb
64GB corsair DDR4 ram
Intel i7 6850k

Any help greatly appreciated.
User avatar
bepeg4d
Octane Guru
Posts: 10365
Joined: Wed Jun 02, 2010 6:02 am
Location: Italy
Contact:

Hi orb101,
CUDA error 702 in general is related to hw or drivers issues.
Have you tried to render with only 3x GPU and leave one GPU for system/monitor only?
ciao beppe
orb101
Licensed Customer
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

Hi Beppe

I tried;

4x GPU's fail
3x GPU's fail
2x GPU's fail
1x GPU success

I then tried to isolate the failing card, but found that any combination of two cards or more always failed.

I thought it might be a bad pcie slot, but rotating the cards showed that it could fail no matter which slot's i use.

SLI options are turned off in Nvidia control panel.

Any chance this is to do with system RAM or HDD?
User avatar
bepeg4d
Octane Guru
Posts: 10365
Joined: Wed Jun 02, 2010 6:02 am
Location: Italy
Contact:

Hi orb101,
I suspect more in the motherbord bios and Win 10.
Maybe worths to try with a different hdd and Win 7.
ciao beppe
orb101
Licensed Customer
Posts: 20
Joined: Thu May 26, 2016 8:39 pm

For those interested, I did a lot more testing with various GPUs, all of which failed when attached to one of the PCIE slots.

It looks like a fault on the motherboard, and it's on it's way back to the manufacturer for further tests.
tobydalsgaard
Licensed Customer
Posts: 17
Joined: Mon May 15, 2017 2:00 pm

I know this is probably too late for you since you have already shipped it off the mobo:

I had the exact same issues with a second PC that we had purchased, the fix was to downgrade the Nvida drivers. Haven't had a single problem since. Just FYI the version I downgraded to was 382.33 which is what our other PC was running on
coilbook
Licensed Customer
Posts: 3032
Joined: Mon Mar 24, 2014 2:27 pm

same here -> kernel execution failed(report)
CUDA error 702 on device 3: the launch timed out and was terminated
-> failed to launch kernel(ptBrdf2)
device 3: direct light kernel failed
CUDA error 702 on device 0: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 0: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 1: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 1: the launch timed out and was terminated
-> could not get memory info
CUDA error 702 on device 3: the launch timed out
joeycamacho
Licensed Customer
Posts: 42
Joined: Sat May 24, 2014 6:30 pm

tobydalsgaard wrote:I know this is probably too late for you since you have already shipped it off the mobo:

I had the exact same issues with a second PC that we had purchased, the fix was to downgrade the Nvida drivers. Haven't had a single problem since. Just FYI the version I downgraded to was 382.33 which is what our other PC was running on
Thanks - this was helpful. Especially after the Windows Creator's Update. Seemed to have done some strange things to my rig - but this helped nonetheless.
Also, for other viewers - CUDA 702 can sometimes be caused by a Windows TDR setting that will lock up the computer. Read more about it here - https://www.pugetsystems.com/labs/hpc/W ... ience-777/
Hopefully this will help others.
Freelance Motion & Graphic Designer
-----------------------
Cinema 4D R19| Win 10 Home 1803 | 4X GTX TitanX 12GB Maxwell | Intel Core i7 5930K 3.5GHz | 48 GB DDR4 -2133 RAM
FAZ
Licensed Customer
Posts: 90
Joined: Tue Oct 06, 2015 5:56 pm
Location: Miami, FL
Contact:

tobydalsgaard wrote:I know this is probably too late for you since you have already shipped it off the mobo:

I had the exact same issues with a second PC that we had purchased, the fix was to downgrade the Nvida drivers. Haven't had a single problem since. Just FYI the version I downgraded to was 382.33 which is what our other PC was running on
Just to confirm- Your post saved my life! :)

Going back to that specific driver solved a very similar problem I was having as well. Thank you!
www.alexdimella.com IG @alexdimella
rustyippolito
Licensed Customer
Posts: 96
Joined: Wed Mar 08, 2017 10:07 pm

I have been having the exact same problems with my 4x 1080Ti rig and i have tried swapping out cards one at a time, switching them around, and whenever i try to render with more than one card enabled, i get intermittent system freezes and even a few BSODs. I tried downgrading from 388.59 down to 382.33 and now none of my GPUs show up at all in the Octane plugin or in the standalone. Has anyone cracked this problem yet? I have had BOXX (It is a maxed-out BOXX APEX4 Workstation) send me out multiple replacement Nvidia cards and RAM and had NO luck. I am about to send the whole damned machine back to them and see if they can find a solution, but even they have said that they are not sure where to diagnose, other than the MB. If anyone has any info that could help us find a direction to go in , we would appreciate it.
Post Reply

Return to “Maxon Cinema 4D”