Page 1 of 2
Node crash halts render
Posted: Sun Nov 13, 2016 10:27 pm
by BorisGoreta
With the latest 3.04.3.0 version if one of the nodes drops the frame never finishes.
Re: Node crash halts render
Posted: Mon Nov 14, 2016 9:31 am
by juanjgon
Are you talking about nodes using the Octane native network rendering, right? This can be a problem in Octane itself. I'm not sure if Standalone or other plugin have the same problem ...
-Juanjo
Re: Node crash halts render
Posted: Mon Nov 14, 2016 9:55 am
by BorisGoreta
Yes, with the Octane native network rendering, I have a dodgy node which can't survive the whole night of rendering so every time it stops in the middle.
Re: Node crash halts render
Posted: Wed Nov 16, 2016 8:55 pm
by abstrax
BorisGoreta wrote:Yes, with the Octane native network rendering, I have a dodgy node which can't survive the whole night of rendering so every time it stops in the middle.
There have been no relevant changes here, so I don't think it's related to only specific versions. But let's investigate the issue anyway. Does the dodgy slave crash or just hang? What is the output on the terminal?
Re: Node crash halts render
Posted: Wed Nov 16, 2016 10:46 pm
by BorisGoreta
The slave doesn't crash, it hangs, it reports the same error with some code for all devices. I have identified the dodgy GPU and removed it from the system and now it works fine.
Re: Node crash halts render
Posted: Wed Nov 16, 2016 11:57 pm
by abstrax
BorisGoreta wrote:The slave doesn't crash, it hangs, it reports the same error with some code for all devices. I have identified the dodgy GPU and removed it from the system and now it works fine.
Hmm ok. In theory (and as far as I have tested it), the slave should inform the master when all devices have failed and the net render master should then return all unfinished assignments to the render target which will then redistribute them. In case you or someone else wants this issue being investigated further, it's probably best if you enable logging on the master and the slaves and then send me the log files the next time the problem occurs. To enable logging, you have to copy the following file into the directory with the Octane slave binary (on the slaves) and the Standalone / Octane DLL binary (on the master) and then make sure that the applications are restarted:
Re: Node crash halts render
Posted: Thu Nov 17, 2016 12:07 am
by BorisGoreta
Ok I will do that. In my experience every time the node halted or restarted rendering would just stop. I would have to press abort and restart render. This usually happens over night at some point while I am not monitoring rendering progress. Huge issue for me since I can't go to sleep thinking sequence will be finished in the morning.
Re: Node crash halts render
Posted: Thu Nov 17, 2016 6:21 am
by Lewis
Yes, that's big issue and as Abstrax said that shouldn't happen, I also thought that problem is solved and no mater how many GPUs might fail (for whatever reason overnight) render should NOT stop until last GPU in network is live/rendering. Otherwise all our deadlines overnight or while we are not in front of computer babysitting it are big mistery will they finish on time or not finish at all.
Re: Node crash halts render
Posted: Sun Dec 04, 2016 9:03 pm
by BorisGoreta
This is happening a lot again with one of the node. What happens is that node command window halts the render completely. If I kill this node window by pressing the X on the top right of the window the render continues normally.
This is very easy to test, just cut power to one of the GPUs in the node and it will halt the render.
Why isn't there some heartbeat test for the nodes ? If it doesn't reply in a reasonable amount of time just disregard it from subsequent frames and continue rendering with what you've got left.
Re: Node crash halts render
Posted: Sun Dec 04, 2016 9:12 pm
by abstrax
BorisGoreta wrote:This is happening a lot again with one of the node. What happens is that node command window halts the render completely. If I kill this node window by pressing the X on the top right of the window the render continues normally.
This is very easy to test, just cut power to one of the GPUs in the node and it will halt the render.
Why isn't there some heartbeat test for the nodes ? If it doesn't reply in a reasonable amount of time just disregard it from subsequent frames and continue rendering with what you've got left.
There is a heartbeat to detect deadlocks. Regarding generic timeouts: What is a reasonable amount of time between responses? There are so many components in play here that can delay communication and some scenes really take a long time to render a tile...