Best Practices For Building A Multiple GPU System

Mon Jan 11, 2016 9:46 pm

smicha wrote:You are the light, Tutor.

But just a tiny, tiny, tiny light, for he who made us all shall always shine most brightly.

Mon Jan 11, 2016 10:19 pm

@Tutor, (I won't quote your post, not to fill the thread pages... Kiddin')
Nice to hear about the path you took. Thanx for sharing.
When finding the time, I will find my path (regarding to c4d)...
Greetz,

Thu Jan 14, 2016 7:41 pm

Time for an OctaneBench score of close to 5,000 or more if Pascals are at least twice as fast as Titan Xs when SomeRichOne (not me) buys a One Stop Systems GPUltima PetaFlop Compute Platform and loads it with 16 top of the line Pascals, tweaks them and Octane benches them - if Otoy raises the GPU license limit sufficiently and there aren't any IO space issues [150x2=300; 16x300=4,800]. Just image what the score would be if 16 Pascals are much more than twice as fast as 16 Titan Xs - say "3x or 4x." Now imagine what the score would be if there are dual GPU Pascals and there aren't any IO space issues. Unfortunately for me, looks like 15-16 physical cards is the most my self-builds will hold. I'm currently working out IO space issues involved in using 21 GPU processors - 6 x dual GPU cards and 9 x single GPU cards. I'll continue to post updates.
.

Fri Jan 15, 2016 5:31 am

Tutor, do you notice higher viewport lag (rotate, pan, zoom) with Octane 3 Alpha (1, 2, or 3) standalone as you raise the # of GPUs closer to 12? In my case, I see no lag at 1, 2, 3, 4, start to see at 5, 6, 7, 8, 9, and then 10, 11, and 12 are jagged, like a low frame rate. Of course render speed increases with higher GPU, but the OTOY development already admits V3 is slower than V2...plus at higher GPU the viewport lag sucks. What is interesting is that a same scene will also lag on V2 with more GPU, but at a way lower pace. Like, V3's 4 GPU lag is = V2's 12 GPU lag. Also, with any 1 GPU, doesn't matter if the X or any of the 12 Zs, there is no lag on V3. I could postulate that Octane 3 taxes my machine more in (CPU?) than V2 did as GPU count goes higher. I have asked OTOY's developer Abstrax about this, but OTOY don't comment on it. I get the feeling that they don't know what they are supposed to say. It's not like everyone has 7-8 GPU (forget 12...), so they can't test, just speculate. I am wondering how powerful even a mighty Pascal would be on V3 unless your system is super jacked w/ CPU, which leads me to think I ought to overclock GPU and find out.

Looking at Octane 3 new features also makes me leery...I see Adobe's paws starting to creep in. I fear them, they turn everything cloud. I don't know how many times I saw them disintegrate the functionality of their good iPad apps.

Sat Jan 16, 2016 3:17 pm

Just will note that I since tested my rig, with CPU overclocked to 4.6 ghz, from 3.2. This had no effect on Octane render 3 performance at all with lag issues. I tested both with only one GPU connected and with all 13 GPU connected (1X +12 Z)
Also tested different Max Tile Sample settings, had no apparent effect. Maybe then lag could be linked to PCI speed (x16 vs x8, x4, X1). Tom G had suggested in the Developmental build forum along these lines.

But I do note 2 things that lead me to believe it is more software specific:
1- at one GPU at 1x (Titan Z), I see no lag on V3
2- at 12 GPU all at 1x, on Octane Render 2, I see very little lag.

This is why I am interested to see what Tutor sees when he runs standalone in a project while 10 or greater GPU are connected, vs 1. If it is only me, then I'm gonna be inclined to troubleshoot my own rig. But if Tutor sees it too, then I can bring that to the attention of the developers that there is likely a hitch with the software.

Sun Jan 17, 2016 9:19 am

Notiusweb wrote:Tutor, do you notice higher viewport lag (rotate, pan, zoom) with Octane 3 Alpha (1, 2, or 3) standalone as you raise the # of GPUs closer to 12? In my case, I see no lag at 1, 2, 3, 4, start to see at 5, 6, 7, 8, 9, and then 10, 11, and 12 are jagged, like a low frame rate. Of course render speed increases with higher GPU, but the OTOY development already admits V3 is slower than V2...plus at higher GPU the viewport lag sucks. What is interesting is that a same scene will also lag on V2 with more GPU, but at a way lower pace. Like, V3's 4 GPU lag is = V2's 12 GPU lag. Also, with any 1 GPU, doesn't matter if the X or any of the 12 Zs, there is no lag on V3. I could postulate that Octane 3 taxes my machine more in (CPU?) than V2 did as GPU count goes higher. I have asked OTOY's developer Abstrax about this, but OTOY don't comment on it. I get the feeling that they don't know what they are supposed to say. It's not like everyone has 7-8 GPU (forget 12...), so they can't test, just speculate. I am wondering how powerful even a mighty Pascal would be on V3 unless your system is super jacked w/ CPU, which leads me to think I ought to overclock GPU and find out.

Looking at Octane 3 new features also makes me leery...I see Adobe's paws starting to creep in. I fear them, they turn everything cloud. I don't know how many times I saw them disintegrate the functionality of their good iPad apps.

I haven't tried Octane 3 Alpha yet, but when I do try it I'll let you know my experience(s). I have read about it and the following stood out to me as a warning that CPU(s) and system memory will play greater roles (and may need to be heftier) and that multi-GPU systems and rendering speeds may take hits:

viewtopic.php?f=33&t=51679
"... To solve these issues we moved the film buffer into host memory. Doesn't sound exciting, but has some major consequences. The biggest one is that now Octane has to deal with a huge amount of data the GPUs produce. Especially in multi-GPU setups or when network rendering is used. As a solution, we introduced tiled rendering for all integration kernels except PMC (where tiled rendering is not possible). The tiles a relatively large (compared to most other renders), and we tried to hide tile rendering as much as possible.

Of course, the film buffer in system memory means more memory usage, so make sure that you have enough RAM installed before you crank up the resolution (which is now straight forward to do). Another consequence is that the CPU has to merge render results from the various sources like local GPUs or net render slaves into the film buffers which requires some computational power. We tried to optimize that area, but there is obviously an impact on the CPU usage. Let us know if you run into issues here. Again, increasing the "max. tile samples" option in the kernels allows you to reduce the overhead accordingly (see above)."

P.S. If that latest 12 Titan Z GPU OctaneBench score of 1,232 is yours, Congratulations!

Sun Jan 17, 2016 2:22 pm

Thanks Tutor. Here is a video I made of the lag issue. I posted it to the builds release forum as well.
https://vimeo.com/152060121

Sun Jan 17, 2016 9:00 pm

Notiusweb wrote:Thanks Tutor. Here is a video I made of the lag issue. I posted it to the builds release forum as well.
https://vimeo.com/152060121

If a pic can be worth a thousand words, then your vid is worth millions. Keep it up!

Thu Jan 28, 2016 4:59 pm

Tutor,

I am sorry if I missed it - which case supports X10DRX motherboard? Any lian li? Other brands?

Sat Feb 06, 2016 3:27 am

smicha wrote:Tutor,

I am sorry if I missed it - which case supports X10DRX motherboard? Any lian li? Other brands?

Hello Smicha,

Although there might be other cases of which I'm currently unaware for the job, this is what I've found thus far:
I have five Lian Li PC- D8000s ( see either http://www.lian-li.com/en/dt_portfolio/pc-d8000/ or http://www.newegg.com/Product/Product.a ... 6811112390 [currently and most recently "out of stock" but I've also seen them today on Ebay for sale:
http://www.ebay.com/itm/LIAN-LI-PC-D800 ... 1619094906
http://www.ebay.com/itm/New-LIAN-LI-PC- ... 1618751268 ] ). Three of my D8000s are for my three X9DRXs (the same size specs as the X10s) and two for my two EVGA SR-2s. Size and depth wise the eleven PCIe slotted X10s/X9s fit almost perfectly since the D8000s have eleven equally positioned slot connectors (and as pointed out below, so do almost all HPTX cases). I said "almost" because (1) I did have to set three additional standoffs on each of the three motherboards (used for my X9DRXs) to get the motherboards as secure/supported as I wanted them; (2) four of the X10s/X9s PCIe slots (those on the right hand side) aren't for full length video cards - I use them for Amfeltec 4 way GPU oriented splitter cards, EVGA GT 640s and PCIe SSD cards and the like (i.e., short length cards) because those slots are blocked by one of the two GPUs and it's associated memory ( download/file.php?id=44540&t=1 ) => yielding only seven or eight slots (depending on the cards backing) for full length single (or up to four for full length double) width cards (the fourth card on the far left might need to be single width or water-cooled or the chassis will need to be further modified to release the card's heat out of the case), and as to their being X8 physically sized slots, I've inserted those X8 to X16 riser cards that I mentioned earlier to seat the X16 video cards ( download/file.php?id=43956&mode=view ); and (3) that riser means that I have had to use #4-40x1" machine screws and associated washers for them to secure the full length video cards to the motherboard chassis since the risers raise the video cards somewhat higher that the standard attachment point at the rear of the case. The countervailing aid is that the motherboard tray can still be easily slid in or out of the case [ http://www.lian-li.com/en/files/2013/01/PC-D8000-02.png ] for that and other tasks.

There are these Supermicro cases purposed for those motherboards :
http://www.supermicro.com/products/chas ... Q-R982.cfm
http://www.supermicro.com/products/chas ... R1K23B.cfm
http://www.supermicro.com/products/chas ... R1K23B.cfm and
http://www.supermicro.com/products/chas ... Q-R606.cfm ,
but none of them satisfy my intended use-case {well not "case" literally} as well as the D8000s do.

Also, there may be other HPTX motherboards such as - http://www.lian-li.com/en/products/#32/1/list - that may better suite your intended use. Generally, the HPTX motherboard cases support up to 11 single wide PCIe slotted motherboards. Take a look at the LIan LI v2120 for instance - http://www.lian-li.com/en/dt_portfolio/pc-v2120/ .

If you have any further questions or concerns, you know I'm always happy to help you.