If I render a scene with network rendering (using two machines each with 3 x GTX780) the render never finishes, it just gets stuck very near the end. In a 16000 sample/pix render it will stop at around 15900 samples. This means the frame never finishes, never saves, and doesn't move on to the next frame. If I lower the samples/pixel value in the Kernel to 15000 it will stop at around 14900.
Interestingly this doesn't happen if I don't use network rendering. This suggests to me that there is an issue with network rendering sometimes where it can't decide which GPUs should be doing the last bit of rendering, seems like a conflict perhaps. This doesn't happen on all scenes, but it is becoming a real problem with an animated sequence that I'm working on. Is there anything I can do to solve this? Is it a known bug?
Network render frame will not finish
Forum rules
Please post only in English in this subforum. For alternate language discussion please go here http://render.otoy.com/forum/viewforum.php?f=18
Please post only in English in this subforum. For alternate language discussion please go here http://render.otoy.com/forum/viewforum.php?f=18
try to uncheck "show RFB" in net render window.
YOKO Studio | win 10 64 | i7 5930K GTX 3090 | 3dsmax 2022.3 |
I'm still finding this is a problem, I've rendered a few different scenes and it will render a few (sometime a hundred or so) frames with no problem, and then a frame will render 99%, but then not add any more samples and will then not finish the frame or move on. It isn't really freezing, it seems like it just doesn't know which GPU on the network should finish the frame. This is becoming a massive problem as I keep rendering overnight and come to work in the morning to find that a sequence is only 10% complete and has been sat there doing nothing for 10 hours.
and as a further update, I tried rendering a sequence without using the network rendering and it still wouldn't progress / save a frame after about 200 frames.
and as a further update, I tried rendering a sequence without using the network rendering and it still wouldn't progress / save a frame after about 200 frames.
That could be a hardware issueRico_uk wrote:I'm still finding this is a problem, I've rendered a few different scenes and it will render a few (sometime a hundred or so) frames with no problem, and then a frame will render 99%, but then not add any more samples and will then not finish the frame or move on. It isn't really freezing, it seems like it just doesn't know which GPU on the network should finish the frame. This is becoming a massive problem as I keep rendering overnight and come to work in the morning to find that a sequence is only 10% complete and has been sat there doing nothing for 10 hours.
and as a further update, I tried rendering a sequence without using the network rendering and it still wouldn't progress / save a frame after about 200 frames.
http://render.otoy.com/forum/viewtopic.php?f=27&t=41237
I'm almost certain it's not, we've got a 1500w power supply in each machine and lots of cooling. Also, if I restart the render the second it stops then it will carry on perfectly happy. Could it be anything else? The render isn't really stopping, it just will not finish the frame, but it is always when the progress bar is right at the end of a frame, never anywhere else.
It sounds as if a the slave doesn't return a result for the assigned work, i.e. the master has assigned some samples to the slave, but the slave never returned a result. Alternatively, the returned result doesn't match the local results and is discarded, which could result in a similar behaviour. I can't say what exactly the problem is until I get more information:
- Do you get any error messages in the console window of the slave?
- How many samples are missing?
- How often does it happen? All the time?
- Do you have sub-sampling enabled?
- Can you reproduce the problem in the Standalone?
- The problem probably also occurs with smaller maximum sample settings. Could you verify this?
I've got more questions, but let's start with these first. Thank you.
- Do you get any error messages in the console window of the slave?
- How many samples are missing?
- How often does it happen? All the time?
- Do you have sub-sampling enabled?
- Can you reproduce the problem in the Standalone?
- The problem probably also occurs with smaller maximum sample settings. Could you verify this?
I've got more questions, but let's start with these first. Thank you.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Hi Abstrax, thanks for the reply, hopefully you can hope to resolve this.
-I've not checked the slave console screens when it freezes, I'll do that this evening when it next freezes. (currently rendering a long sequence.)
- Usually there are only 100 or sample left to go when it gets stuck.
- Happens quite randomly, could be after 3 frames, could be after 500.
- I have not changed any of the sub-sampling numbers from their default.
- I haven't tried in standalone, I'm not too familiar with it.
- I am rendering to a minimum of 8,000 samples, but usually 16,000.
I would like to point out though that this has happened a couple of times even without network rendering, and just using my local machine. It seems to happen less, but I guess that with more GPUs over the network the likely hood of one of them messing something else to cause the error is magnified.
-I've not checked the slave console screens when it freezes, I'll do that this evening when it next freezes. (currently rendering a long sequence.)
- Usually there are only 100 or sample left to go when it gets stuck.
- Happens quite randomly, could be after 3 frames, could be after 500.
- I have not changed any of the sub-sampling numbers from their default.
- I haven't tried in standalone, I'm not too familiar with it.
- I am rendering to a minimum of 8,000 samples, but usually 16,000.
I would like to point out though that this has happened a couple of times even without network rendering, and just using my local machine. It seems to happen less, but I guess that with more GPUs over the network the likely hood of one of them messing something else to cause the error is magnified.
Ok, thanks. We fixed a few bugs, where some information about tonemapping and sub-sampling wasn't distributed correctly to the render threads and the slaves (both receive the same data structures). Maybe the problem was related to those bugs. We will try to get a new build out as soon as possible, please try that again. I will also add some more logging to the slaves so we can check, what's going on.Rico_uk wrote:Hi Abstrax, thanks for the reply, hopefully you can hope to resolve this.
-I've not checked the slave console screens when it freezes, I'll do that this evening when it next freezes. (currently rendering a long sequence.)
- Usually there are only 100 or sample left to go when it gets stuck.
- Happens quite randomly, could be after 3 frames, could be after 500.
- I have not changed any of the sub-sampling numbers from their default.
- I haven't tried in standalone, I'm not too familiar with it.
- I am rendering to a minimum of 8,000 samples, but usually 16,000.
I would like to point out though that this has happened a couple of times even without network rendering, and just using my local machine. It seems to happen less, but I guess that with more GPUs over the network the likely hood of one of them messing something else to cause the error is magnified.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Amazing, that'd be great. Look forward to the next release, hopefully it will sort out those problems. I tend to find that when using the Octane frame buffer it is much more reliable (unsurprisingly) than the Max frame buffer, but there isn't a way to use that to render the final frames. Will it be possible soon to totally bypass the Max frame buffer and save sequences through the Octane render window inside of Max? The data that you see on the Octane frame buffer is really useful and it does seem much more stable.