Automatically close failed slave?

Forums: Automatically close failed slave?
Maxon Cinema 4D (Export script developed by abstrax, Integrated Plugin developed by aoktar)

Moderator: aoktar

Automatically close failed slave?

Postby PHM » Fri Nov 10, 2017 10:09 am

PHM Fri Nov 10, 2017 10:09 am
Hi, I'm not sure if this is specific to Cinema or to Octane in general, but we are finding that quite often a slave will crash at some point during rendering animations in the Picture Viewer, and the frame will then never complete. As soon as the crashed slave is removed, the frame completes and starts the next one.
This creates problems when running overnight, as you can have a frame freeze at 2am and only find out in the morning that the time was wasted.

The slaves are a mix of 1080ti and 980ti in pairs (8 slaves, 16 cards total), we have tested as much as we can and cannot find a specific trigger. It's not memory, sometimes a 1080ti will fail when the 980ti are fine.
It seems to happen most often when a slave joins a job part way through, but sometimes a slave will fail that has been working since the beginning of the job and cause the same problem.
Removing and re-adding the slave works, so there's no consistency there either. All slaves are running the same drivers: 382.33 and Octane 3.06 stable.

2017-11-10 08_46_52-Picture Viewer.png
Job paused on 5th frame. For 14 hours.
2017-11-10 08_46_52-Picture Viewer.png (14.88 KiB) Viewed 2108 times


As you can see, a slave crashed on the 5th frame, and the frame paused at 99.97% for 14 hours. When the slave was restarted everything ran fine again.

screenshot.png
Slave failed 5 frames in. When restarted it worked fine and completed the job


In this case, the error was CUDA 719, but this is not consistent - it sometimes be an error of "not receiving all information for a frame", or sometimes simply say the "slave crashed or was stopped (CTRL-C)"

I understand there are all sorts of variables that could affect rendering so I'm not looking for a specific fix, but rather:

Is there any way to automatically force a slave daemon to quit if it fails - so that it is removed from the pool and the frame will finish?

I'd rather that the job continued a little slower overnight than froze for hours at a time.

Thanks in advance,
James
Octane 3.07.R2 | Cinema 4D 18.057
Windows 7 64bit | 64GB RAM | 8x Geforce GTX 1080ti/8x Asus Strix GTX980ti
PHM
Licensed Customer
Licensed Customer
 
Posts: 21
Joined: Wed Feb 24, 2016 10:38 am

Re: Automatically close failed slave?

Postby miohn » Mon Nov 13, 2017 12:17 pm

miohn Mon Nov 13, 2017 12:17 pm
Hi,

I also have this problem from time to time and would like to know why
its not possible, that the Master continues rendering just without the crashed slave?

regards
Mike
miohn
Licensed Customer
Licensed Customer
 
Posts: 746
Joined: Sun Jun 06, 2010 3:00 pm

Re: Automatically close failed slave?

Postby bepeg4d » Mon Nov 13, 2017 2:50 pm

bepeg4d Mon Nov 13, 2017 2:50 pm
Hi guys,
unfortunately, I have tried several times to reproduce this issue without success.
If the Slave crashes, the Master continue to render here.
If the issue is not clearly reproducible, is very difficult for the developers to find the culprit.
If you could find a scene that always behaves in this way, please, share with us.
ciao beppe
User avatar
bepeg4d
Octane Guru
Octane Guru
 
Posts: 9958
Joined: Wed Jun 02, 2010 6:02 am
Location: Italy

Re: Automatically close failed slave?

Postby Lewis » Mon Nov 13, 2017 10:11 pm

Lewis Mon Nov 13, 2017 10:11 pm
Hi,

It's definitely not Cinema4D related. I have same issue sin LightWave network rendering through Octane controller.

My topic is here but sadly no answer/news form OTOY :(

viewtopic.php?f=23&t=63777&p=325122#p325122
--
Lewis
http://www.ram-studio.hr
Skype - lewis3d
ICQ - 7128177

WS AMD TRPro 3955WX, 256GB RAM, Win10, 2 * RTX 4090, 1 * RTX 3090
RS1 i7 9800X, 64GB RAM, Win10, 3 * RTX 3090
RS2 i7 6850K, 64GB RAM, Win10, 2 * RTX 4090
User avatar
Lewis
Licensed Customer
Licensed Customer
 
Posts: 1070
Joined: Tue Feb 05, 2013 6:30 pm
Location: Croatia

Re: Automatically close failed slave?

Postby PHM » Wed Nov 15, 2017 9:43 am

PHM Wed Nov 15, 2017 9:43 am
I've managed to make a scene that uses just a bit too much memory for the 6gb cards and this does reproduce the error - but not consistently.
Most of the time the slaves fail and the render continues, but sometimes - like in this screenshot - the render gets stuck.

2017-11-15 09_40_29-Log.png


You can see that:
The slaves don't show as failed (three of them did because of lack of memory)
Nothing shows in the log
The render is stuck - it's been 5 seconds from finishing the frame for over 15 minutes.

On the three slaves that failed, this is the error:

2017-11-15 09_32_02-PHM-COW01 - TeamViewer.png


While I have managed to force this to happen, as mentioned previously it's not the same error every time. It doesn't seem to be related to any single issue, but occasionally the slave that crashes doesn't seem to talk to the machine in charge of rendering, which waits indefinitely for a result that's not coming.

What would be great is if there could be a time-out set on the master, so if it receives no result in a set amount of time - 2 minutes for example - it excludes the slave and carries on, or even a command-line instruction on the slave daemon to quit if it encounters an error. Is that possible?
Octane 3.07.R2 | Cinema 4D 18.057
Windows 7 64bit | 64GB RAM | 8x Geforce GTX 1080ti/8x Asus Strix GTX980ti
PHM
Licensed Customer
Licensed Customer
 
Posts: 21
Joined: Wed Feb 24, 2016 10:38 am

Re: Automatically close failed slave?

Postby rleuchovius » Fri Nov 17, 2017 3:02 pm

rleuchovius Fri Nov 17, 2017 3:02 pm
We have the same issue at our studio from time to time, the render stops at 99% of one frame and then gets stuck, halting the rest of the night render. Some sort of time out function would be highly appreciated.
rleuchovius
Licensed Customer
Licensed Customer
 
Posts: 51
Joined: Sun May 03, 2015 1:47 pm

Re: Automatically close failed slave?

Postby abstrax » Fri Nov 24, 2017 12:17 am

abstrax Fri Nov 24, 2017 12:17 am
Just to let you know: I didn't have time yet, but will have an in-depth look into the reported problem next week. If there is any more information that might be relevant, feel free to add it to this thread. Thanks a lot.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
abstrax
OctaneRender Team
OctaneRender Team
 
Posts: 5484
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

Re: Automatically close failed slave?

Postby abstrax » Fri Nov 24, 2017 4:12 am

abstrax Fri Nov 24, 2017 4:12 am
James, I just noticed that you are using version 3.06. Could you (when you've got time) update to version 3.07 to see if the problem is gone. I don't think that the update will solve your problem, but it's worth a try. Thanks.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
abstrax
OctaneRender Team
OctaneRender Team
 
Posts: 5484
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

Re: Automatically close failed slave?

Postby Lewis » Fri Nov 24, 2017 6:38 am

Lewis Fri Nov 24, 2017 6:38 am
abstrax wrote:James, I just noticed that you are using version 3.06. Could you (when you've got time) update to version 3.07 to see if the problem is gone. I don't think that the update will solve your problem, but it's worth a try. Thanks.


You are right, 3.07 will not fix this.
I was using 3.07.1 (LW verison) but had same issue, if one slave dies (for whatever reason) then rest just stops and waits untill i hit continue :(.

Thanks
--
Lewis
http://www.ram-studio.hr
Skype - lewis3d
ICQ - 7128177

WS AMD TRPro 3955WX, 256GB RAM, Win10, 2 * RTX 4090, 1 * RTX 3090
RS1 i7 9800X, 64GB RAM, Win10, 3 * RTX 3090
RS2 i7 6850K, 64GB RAM, Win10, 2 * RTX 4090
User avatar
Lewis
Licensed Customer
Licensed Customer
 
Posts: 1070
Joined: Tue Feb 05, 2013 6:30 pm
Location: Croatia

Re: Automatically close failed slave?

Postby thanulee » Fri Nov 24, 2017 10:16 am

thanulee Fri Nov 24, 2017 10:16 am
Same issue here with slave crash. Awaiting response. thanks
User avatar
thanulee
Licensed Customer
Licensed Customer
 
Posts: 709
Joined: Sat Dec 19, 2015 11:00 pm
Next

Return to Maxon Cinema 4D


Who is online

Users browsing this forum: No registered users and 28 guests

Wed Apr 24, 2024 11:54 pm [ UTC ]