10 GPUs - 53 sec, 20 GPUs - 43???

A forum where development builds are posted for testing by the community.
Forum rules
NOTE: The software in this forum is not %100 reliable, they are development builds and are meant for testing by experienced octane users. If you are a new octane user, we recommend to use the current stable release from the 'Commercial Product News & Releases' forum.
Post Reply
coilbook
Licensed Customer
Posts: 3032
Joined: Mon Mar 24, 2014 2:27 pm

Hi, We have two slaves each holds 10 GPUs
One has 2080Tis, the other one Titan X Pascal.

10 x 2080tis 53 seconds, but 20 GPUs only 43. Should it be around 30? I am giving the system almost twice more power.
User avatar
Lewis
Licensed Customer
Posts: 1102
Joined: Tue Feb 05, 2013 6:30 pm
Location: Croatia
Contact:

you forgot to calculate speed of GPUs

1. 2080 Ti has OB score 305-310 so 10*307 = 3070 OB Total score
2. TitanxP has OB score of 250 so 10*250 = 2500 OB score

So in percentages 2500+23%=3075 which means your 2080Ti setup should be 22-23% faster with same number of GPUs just because faster GPUs (20180ti vs TitanxP), also speedup is not exactly linear especially 'coz your render slave GPUs starts rendering slower 'cot main workstation needs to send all the data through network so in case you measure that time on still/first frame it's gonna be noticeable time lag until slave get's all the data over network for the first time.

P.S. What motherboard you have to be able to run 10 GPUs in one system ? Thanks.
--
Lewis
http://www.ram-studio.hr
Skype - lewis3d
ICQ - 7128177

WS AMD TRPro 3955WX, 256GB RAM, Win10, 2 * RTX 4090, 1 * RTX 3090
RS1 i7 9800X, 64GB RAM, Win10, 3 * RTX 3090
RS2 i7 6850K, 64GB RAM, Win10, 2 * RTX 4090
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Lewis is right, I have experienced the same kind of thing. In general it actually takes more time to feed data to and from more GPUs, so it is not linear.
There will be diminishing returns after a certain # cards. You may find it closer to linear maybe at 13, 14, 15, and then drop off at 17, 18, 19, for example...

But, another giant factor to really troubleshoot with such thing is the hardware arrangement.
Are they on extenders at 1x or 4x, or, are they all on a same motherboard at 8x+?
This makes a huge difference as well. So, Lewis is especially right on that point too - what is the motherboard, and the OS.
Linux and Win 10 are not the same with handling and processing of multi-GPU data in smaller and smaller time scales.

I myself found that if your arrangement is re-tweaked hardware-wise, it can definitely lead you to different speeds. It may not yet be optimized as far as raw speed goes.
(LOL...I always go crazy with seconds difference because on thousands of frames it adds up!)
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
coilbook
Licensed Customer
Posts: 3032
Joined: Mon Mar 24, 2014 2:27 pm

Thank you all.
We use Supermicro SYS-4028GR-TRT2.
Also processing time is only 5 seconds per frame.
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

Good day,

network speed is important. I would try to see what You have and if You can upgrade it somehow (10G is recommended if You go with higher resolutions)

would be good to open something like MSI, and take a look into how GPUs are utilized compared to your main machine (I bet You under-utilizing those cards).

Last but not least, there is this parameter: Minimize Network Traffic (In Kernel settings under Sampling tab) - this could help nodes to talk more efficiently.
coilbook
Licensed Customer
Posts: 3032
Joined: Mon Mar 24, 2014 2:27 pm

glimpse wrote:Good day,

network speed is important. I would try to see what You have and if You can upgrade it somehow (10G is recommended if You go with higher resolutions)

would be good to open something like MSI, and take a look into how GPUs are utilized compared to your main machine (I bet You under-utilizing those cards).

Last but not least, there is this parameter: Minimize Network Traffic (In Kernel settings under Sampling tab) - this could help nodes to talk more efficiently.
Thanks, It is 10 GB, 16xPCIe speeds bus to each GPU and we only render animation at 1080P. So 3 seconds is lost for all the data transfer. Raw rendering time is 50 sec for 10 GPUs and 40 for 20 GPUs
coilbook
Licensed Customer
Posts: 3032
Joined: Mon Mar 24, 2014 2:27 pm

I also noticed denoiser makes rendering way slower. (During actual rendering and not denoising stage). Hopefully this can be addressed in the future
coilbook
Licensed Customer
Posts: 3032
Joined: Mon Mar 24, 2014 2:27 pm

So I did some tests with 20 gpus and 10 gpus rendering just letters on the screen . Both 20 and 10 gpus results were 2 seconds per frame. So the most amount of time went towards sending data and processing. I am not sure how they got brigade to do 60 frames per second.
pixym
Licensed Customer
Posts: 598
Joined: Thu Jan 21, 2010 4:27 pm
Location: French West Indies

So, the best efficiency way is to render several range of frames on each machine without net rendering…
Work Station : MB ASUS X299-Pro/SE - Intel i9 7980XE (2,6ghz 18 cores / 36 threads) - Ram 64GB - RTX4090 + RTX3090 - Win10
Net render : MB Asus Pro WS W790E-SAGE SE - XEON - 128GB - 2 x RTX 3090 - 3 x RTX 2080TI
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

It is pointless to test the difference with frames that render 2 seconds. Test with frames which render around a minute and you will surely see the difference.
Post Reply

Return to “Development Build Releases”