Page 1 of 1

CUDA Error 700 questions

Posted: Sat Jun 18, 2016 10:55 pm
by Lewis
Hi!

I have questions about CUDA 700 errors. I'm getting those occasionally in Octane 3.0.20 (Final LW release). It's not scene dependent or easy reproducible but requires longer rendertime scenes (like Arch-Viz interiors), Basically it crashes one of GPUs and then stops rendering (although they are not overheating at all 'coz Temps of my GPUs are 65-72 depending on position in Case in 23-24°C ambient temp). It happens to me usually on longer Still renders 1-3h frame.

What drives me crazy is that when one of GPUs fails them Octane looses the samples and STOPs rendering altogether which makes night renders very problematic 'coz i get up and have maybe 1, maybe 2 maybe no frames done :(..I'm OK with fact that it has to loose samples if GPU fails but since each GPU has all data anyway and they render in parallel why it has to stop rendering and loose ALL :(? Can't it be made to continue rendering until last one of GPUs fails (in worst case scenario of course).
Can you guys make it to be "smarter" and keep rendering until "last man standing" (i.e. last GPU renders). Loose samples from failed GPU and render will slowdown but at least it will render all set frames instead give up completely.

Please make it do so in 3.x updates, it's very frustrating to go to sleep and realize in the morning job is not done :(.

P.S. I've tested my system and it's perfectly stable (all GPUs work fine) in Octane 2.25 for days, I've rendered frames/scenes for 48h straight in 2.25 but can't get that stability in 3.x so far.

Thanks

Re: CUDA Error 700 questions

Posted: Mon Jun 20, 2016 8:39 am
by bepeg4d
Hi Lewis,
does your GPUs have different pci speed?
Do you have risers?
the GPU that fails is always the same, or is it different?
Could you try by leaving the "use for tonemap" option active only to the faster PCI at x16 or x8 without riser?
ciao beppe

Re: CUDA Error 700 questions

Posted: Mon Jun 20, 2016 9:17 am
by Lewis
bepeg4d wrote:Hi Lewis,
does your GPUs have different pci speed?
Do you have risers?
the GPU that fails is always the same, or is it different?
Could you try by leaving the "use for tonemap" option active only to the faster PCI at x16 or x8 without riser?
ciao beppe
Hi!
Thanks for interest, i appreciate.

Yes i have PCI-E USB risers BUT it's not only those that show CUDA 700 Error, it happened yesterday again for overnight render failed GPU 1 (before that it was GPU 3). It's not always same GPU. And i've turned off Tonemap for all GPUs but 1 (connected directly to MB, otherwise i was getting BSODs when tonemap on extender GPUs was on).

BUT since 2 failures in 2 days i've then rendered same frame range (11 stills) on octane 2.25 without issues (14.5 hours in total gpu usage 99% constantly comparing to 70-95% in v3.0) ), no GPU failed and all rendered successfully (although a bit slower in 2.25) so that's why I'm sure it's not GPUs issue but Octane 3.x problem :(. I cant really count on Octane 3.0 that it will finish the job o i must go back to 2.25 for deadline projects.

Last stable Octane 3.x for me was 3.0.6 Alpga, After 3.0.10 i was getting crashes and Cuda 700 errors :(.

Re: CUDA Error 700 questions

Posted: Wed Aug 03, 2016 4:46 pm
by Yan
Same error here....sick of it :(
Night renders are a pain with this error.

Re: CUDA Error 700 questions

Posted: Thu Aug 04, 2016 8:27 am
by bepeg4d
Hi guys,
I'm guessing, but are you sure is not an overclocking/overheating issue?
Have you tried to downclock a bit or changing the fan curve to something more aggressive?
ciao beppe