Page 1 of 2
cuda error 700 on slave [OBSOLETE]
Posted: Tue Jan 05, 2016 2:42 pm
by rappet
Hi Paul,
I got a cuda error 700 on slave
slave:
master:

- Tapperworks-report-160105-223.JPG (14.96 KiB) Viewed 9636 times
Any idea?
greetz,
Re: cuda error 700 on slave
Posted: Tue Jan 05, 2016 2:47 pm
by rappet
Same scene, another render target.
On one slave it has errors, on the other slave no problems.
Re: cuda error 700 on slave
Posted: Tue Jan 05, 2016 11:59 pm
by face_off
Hi Jeroen - perhaps there is an issue with the card or driver on that particular slave? Suggest making sure all the drivers are the same, and install Octane Standalone via the installer EXE to ensure the nvidia timeouts are correct. If you export that scene from ArchiCAD to Standalone and network render from there - does the error still occur?
Paul
Re: cuda error 700 on slave
Posted: Sun Jan 10, 2016 11:54 am
by rappet
face_off wrote:Hi Jeroen - perhaps there is an issue with the card or driver on that particular slave? Suggest making sure all the drivers are the same, and install Octane Standalone via the installer EXE to ensure the nvidia timeouts are correct. If you export that scene from ArchiCAD to Standalone and network render from there - does the error still occur?
Paul
Hi Paul,
The driver update did not do the job. I will try and install the Standalones again to be sure that is correctly done.
greetz,
Edited: reinstalling the standalone did not fix it either.
Edtiedt2: The slave PC is very slow and gives blue screen too. Yhe C hadrdisk was getting full too.
I will clean up first to see if that will help.
Can there be any link with the 700 cuda error?
After free up space, clean windows, and other things I tried again, and still I got this error;
Re: cuda error 700 on slave
Posted: Sun Jan 10, 2016 1:44 pm
by rappet
Hi Paul,
I also tried standalone 23.1 (instead of 24.2) and I get also the cuda 700 error on device 1 on slave when networkrendering.
Then I rendered on the machine with trouble on device one with the standalone to see if it only when using it as networkrendering.. and one gpu failes.
In log it says:
"CUda error on device 1: Errror code unknown
-> failed to unload module"
Might one GPU be broke, and shall I better post this issue in general forum?
greetz,
Re: cuda error 700 on slave
Posted: Sun Jan 10, 2016 8:18 pm
by abstrax
rappet wrote:Hi Paul,
I also tried standalone 23.1 (instead of 24.2) and I get also the cuda 700 error on device 1 on slave when networkrendering.
Then I rendered on the machine with trouble on device one with the standalone to see if it only when using it as networkrendering.. and one gpu failes.
In log it says:
"CUda error on device 1: Errror code unknown
-> failed to unload module"
Might one GPU be broke, and shall I better post this issue in general forum?
greetz,
Sounds like a hardware issue, but to be sure please send us the scene with which you can reproduce the issue and also the GPU model you are using.
Re: cuda error 700 on slave
Posted: Sun Jan 10, 2016 9:52 pm
by rappet
abstrax wrote:rappet wrote:Hi Paul,
I also tried standalone 23.1 (instead of 24.2) and I get also the cuda 700 error on device 1 on slave when networkrendering.
Then I rendered on the machine with trouble on device one with the standalone to see if it only when using it as networkrendering.. and one gpu failes.
In log it says:
"CUda error on device 1: Errror code unknown
-> failed to unload module"
Might one GPU be broke, and shall I better post this issue in general forum?
greetz,
Sounds like a hardware issue, but to be sure please send us the scene with which you can reproduce the issue and also the GPU model you are using.
I am afraid so. The error occurs with all scenes (I tried the benchmark scene), so sending a specific scene has no use.
The model is 780Ti. But only one of the three is having the error.
I will try to locate which one, and swap the slots they are in to see what happens.
Do you have any suggestion for me to test and figure out what kinf of issue this can be?
Re: cuda error 700 on slave
Posted: Mon Jan 11, 2016 3:15 am
by face_off
Jeroen - try under-clocking the faulty card.
Paul
Re: cuda error 700 on slave
Posted: Mon Jan 11, 2016 11:26 am
by rappet
face_off wrote:Jeroen - try under-clocking the faulty card.
Paul
uptill now underclocking seems not to work... even got a bluescreen when underclocking.
and temps are not getting high
greetz,
This is the log I get:
Started logging on 11.01.16 12:27:17
OctaneRender 2.24.2 (2240001)
CUDA error 719 on device 1: An exception occurred on the device while executing a kernel. The context cannot be used anymore and must be destroyed. All existing device memory allocations from this context are invalid and must be reconstructed.
-> kernel execution failed (dl)
device 1: direct lighting failed
CUDA error 719 on device 1: An exception occurred on the device while executing a kernel. The context cannot be used anymore and must be destroyed. All existing device memory allocations from this context are invalid and must be reconstructed.
-> failed to clear linear 2D device memory
CUDA error 719 on device 1: An exception occurred on the device while executing a kernel. The context cannot be used anymore and must be destroyed. All existing device memory allocations from this context are invalid and must be reconstructed.
-> failed to allocated device memory
devic
edted: next time I got a cuda 715 error,
SO 700, 715, 719.. what does that say?
Re: cuda error 700 on slave
Posted: Mon Jan 11, 2016 11:59 am
by face_off
Sounds like a faulty card.
Paul