Node crash halts render

Newtek Lightwave 3D (exporter developed by holocube, Integrated Plugin developed by juanjgon)

Moderator: juanjgon

User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

With the latest 3.04.3.0 version if one of the nodes drops the frame never finishes.
User avatar
juanjgon
Octane Plugin Developer
Posts: 8867
Joined: Tue Jan 19, 2010 12:01 pm
Location: Spain

Are you talking about nodes using the Octane native network rendering, right? This can be a problem in Octane itself. I'm not sure if Standalone or other plugin have the same problem ...

-Juanjo
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

Yes, with the Octane native network rendering, I have a dodgy node which can't survive the whole night of rendering so every time it stops in the middle.
User avatar
abstrax
OctaneRender Team
Posts: 5509
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

BorisGoreta wrote:Yes, with the Octane native network rendering, I have a dodgy node which can't survive the whole night of rendering so every time it stops in the middle.
There have been no relevant changes here, so I don't think it's related to only specific versions. But let's investigate the issue anyway. Does the dodgy slave crash or just hang? What is the output on the terminal?
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

The slave doesn't crash, it hangs, it reports the same error with some code for all devices. I have identified the dodgy GPU and removed it from the system and now it works fine.
User avatar
abstrax
OctaneRender Team
Posts: 5509
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

BorisGoreta wrote:The slave doesn't crash, it hangs, it reports the same error with some code for all devices. I have identified the dodgy GPU and removed it from the system and now it works fine.
Hmm ok. In theory (and as far as I have tested it), the slave should inform the master when all devices have failed and the net render master should then return all unfinished assignments to the render target which will then redistribute them. In case you or someone else wants this issue being investigated further, it's probably best if you enable logging on the master and the slaves and then send me the log files the next time the problem occurs. To enable logging, you have to copy the following file into the directory with the Octane slave binary (on the slaves) and the Standalone / Octane DLL binary (on the master) and then make sure that the applications are restarted:
octane_log_flags.txt
(74 Bytes) Downloaded 183 times
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

Ok I will do that. In my experience every time the node halted or restarted rendering would just stop. I would have to press abort and restart render. This usually happens over night at some point while I am not monitoring rendering progress. Huge issue for me since I can't go to sleep thinking sequence will be finished in the morning.
User avatar
Lewis
Licensed Customer
Posts: 1101
Joined: Tue Feb 05, 2013 6:30 pm
Location: Croatia
Contact:

Yes, that's big issue and as Abstrax said that shouldn't happen, I also thought that problem is solved and no mater how many GPUs might fail (for whatever reason overnight) render should NOT stop until last GPU in network is live/rendering. Otherwise all our deadlines overnight or while we are not in front of computer babysitting it are big mistery will they finish on time or not finish at all.
--
Lewis
http://www.ram-studio.hr
Skype - lewis3d
ICQ - 7128177

WS AMD TRPro 3955WX, 256GB RAM, Win10, 2 * RTX 4090, 1 * RTX 3090
RS1 i7 9800X, 64GB RAM, Win10, 3 * RTX 3090
RS2 i7 6850K, 64GB RAM, Win10, 2 * RTX 4090
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

This is happening a lot again with one of the node. What happens is that node command window halts the render completely. If I kill this node window by pressing the X on the top right of the window the render continues normally.

This is very easy to test, just cut power to one of the GPUs in the node and it will halt the render.

Why isn't there some heartbeat test for the nodes ? If it doesn't reply in a reasonable amount of time just disregard it from subsequent frames and continue rendering with what you've got left.
User avatar
abstrax
OctaneRender Team
Posts: 5509
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

BorisGoreta wrote:This is happening a lot again with one of the node. What happens is that node command window halts the render completely. If I kill this node window by pressing the X on the top right of the window the render continues normally.

This is very easy to test, just cut power to one of the GPUs in the node and it will halt the render.

Why isn't there some heartbeat test for the nodes ? If it doesn't reply in a reasonable amount of time just disregard it from subsequent frames and continue rendering with what you've got left.
There is a heartbeat to detect deadlocks. Regarding generic timeouts: What is a reasonable amount of time between responses? There are so many components in play here that can delay communication and some scenes really take a long time to render a tile...
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Post Reply

Return to “Lightwave 3D”