Is my card overheating? CUDA Error 702

Generic forum to discuss Octane Render, post ideas and suggest improvements.
Forum rules
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
User avatar
portnicki
Licensed Customer
Posts: 124
Joined: Mon Mar 10, 2014 9:28 am
Location: Warsaw
Contact:

Hi all,

following this forums and tips for the best bang-for-buck hardware for Octane, I've decided to go for an aftermarket GeForce GTX580 with 3GB of Ram. I've actually traded it for a GTX760.
I'm not sure this was a smart move.

Oh yeah, having in mind that I'd probably buy another GTX580 3GB in the future, I've also purchased and a brand new PSU - a BeQuiet DarkPower Pro 1200W. So the PSU should not be the issue here.


My problem is that the card is at it's stock clocks and on the newest drivers and I get this CUDA Error 702 just after minutes of rendering. I also get artifacts in FurMark benchmark after 5-7 minutes of testing.
My chasis is well ventilated, I've dissassembled the card and put on some new thermal paste (Arctic Silver 5) and still no go :(
GPU-Z shows that the core temperature peaks at 72 degress Celcius. That is not very much for a high-end GPU stressed in a torture test like FurMark, right?

any ideas?

here's what the Octane log has to say about this:
Started logging on 18.05.14 10:16:03

OctaneRender version 1.50 (1500001)

CUDA error 702 on device 0: The device kernel took too long to execute. This can only occur if timeouts are enabled. The context cannot be used anymore and must be destroyed. All existing device memory allocations from this context are invalid and must be reconstructed.
-> Kernel execution failed (pt)
CUDA device 0: Path tracing failed


Oh yeah - my system doesn't crash or anything like this, Octane just stops rendering (you just can't see any progress in the render) and sometimes Windows shows a dialogue that the dispaly driver shut down but restored itself (whatever that means).

I've heard that artifacts in FurMark indicate that the card is overheating. Not the GPU necesarilly, could be the RAMs or VRAMs for that matter.

Anyone had similar experiances?
i7 2600k + nVidia TITAN black
https://www.behance.net/Auror
User avatar
face_off
Octane Plugin Developer
Posts: 15755
Joined: Fri May 25, 2012 10:52 am
Location: Adelaide, Australia

Could it be the card is faulty? What temps does GPU-Z report?
Win7/Win10/Mavericks/Mint 17 - GTX550Ti/GT640M
Octane Plugin Support : Poser, ArchiCAD, Revit, Inventor, AutoCAD, Rhino, Modo, Nuke
Pls read before submitting a support question
User avatar
portnicki
Licensed Customer
Posts: 124
Joined: Mon Mar 10, 2014 9:28 am
Location: Warsaw
Contact:

face_off wrote:Could it be the card is faulty? What temps does GPU-Z report?
I'm not saying it's impossible. Could very well be :(

My temps go up to 71-72 Celcius in FurMark (as well as Octane). They stay up there until the cards driver stops and restarts (killing all rendering in the process).

I forgot to mention that after buying the card I also replaced the original cooling with a liquid one (Kraken G10 & Zalman LQ315) but I was getting the same errors.
I figured that the RAM and VRAM on the card wasn't being cooled well enough by this solution, so I went back to the original cooling. But now I see it wasn't the Krakens fault.
i7 2600k + nVidia TITAN black
https://www.behance.net/Auror
coilbook
Licensed Customer
Posts: 3086
Joined: Mon Mar 24, 2014 2:27 pm

Hi same here Happened with network rendering
Were you able to fix the problem

Thanks!
User avatar
face_off
Octane Plugin Developer
Posts: 15755
Joined: Fri May 25, 2012 10:52 am
Location: Adelaide, Australia

Does the error occur on all scene or just one?

715 is overheating so I don't think it's that. 702 is showing as:

Code: Select all

CUDA_ERROR_LAUNCH_TIMEOUT       = 702,      ///< Launch exceeded timeout
Try UNDER-clocking your card - does that help?

Paul
Win7/Win10/Mavericks/Mint 17 - GTX550Ti/GT640M
Octane Plugin Support : Poser, ArchiCAD, Revit, Inventor, AutoCAD, Rhino, Modo, Nuke
Pls read before submitting a support question
coilbook
Licensed Customer
Posts: 3086
Joined: Mon Mar 24, 2014 2:27 pm

face_off wrote:Does the error occur on all scene or just one?

715 is overheating so I don't think it's that. 702 is showing as:

Code: Select all

CUDA_ERROR_LAUNCH_TIMEOUT       = 702,      ///< Launch exceeded timeout
Try UNDER-clocking your card - does that help?

Paul
Thanks for reply

I think I finally got it. Seems like every time i enable all 4 cards on my netstor turbo box i get freezes error 700 702 999 etc but any 3 cards works fine on netstor
i guess their silver rated 1200W PSu is not enough for non stop rendering with 4 gtx 780 TIs SC
User avatar
slepy8
Licensed Customer
Posts: 377
Joined: Sun Jul 14, 2013 10:53 am

if its windows change in registry TdrDelay value to higher one - for example 100.
this should do.
coilbook
Licensed Customer
Posts: 3086
Joined: Mon Mar 24, 2014 2:27 pm

slepy8 wrote:if its windows change in registry TdrDelay value to higher one - for example 100.
this should do.
Will do Thank you!
User avatar
slepy8
Licensed Customer
Posts: 377
Joined: Sun Jul 14, 2013 10:53 am

My advice was mainly for Portnicki ;) But you can try it too.
For those 702 or 902 or 999 errors.

I had the issue few days ago.
The delay for GPU response time is just too low and with some complex scenes system restarts the display driver after the TdrDelay value time.
Rising the TdrDelay to higher values gives longer time for calculations.

It worked for me in pretty similar situation.

However 72C is pretty normal working temperature for standard GTX cards.
And they can work at 90C too, so it's not an issue here.
coilbook
Licensed Customer
Posts: 3086
Joined: Mon Mar 24, 2014 2:27 pm

slepy8 wrote:My advice was mainly for Portnicki ;) But you can try it too.
For those 702 or 902 or 999 errors.

I had the issue few days ago.
The delay for GPU response time is just too low and with some complex scenes system restarts the display driver after the TdrDelay value time.
Rising the TdrDelay to higher values gives longer time for calculations.

It worked for me in pretty similar situation.

However 72C is pretty normal working temperature for standard GTX cards.
And they can work at 90C too, so it's not an issue here.
Thanks a lot Slepy8!
I did increase to 100 in TdrDelay Seems like a lot mores stable
Post Reply

Return to “General Discussion”