Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute


BTW - This is the latest version of the Tyan Server [ Tyan FT77AB7059 (B7059F77AV6R-2T) Dual LGA2011 2400W 4U Rackmount Server Barebone System -- http://www.superbiiz.com/detail.php?name=TS-B759F2T# -- http://www.tyan.com/Barebones_FT77AB705 ... F77AV6R-2T ]. I have the 1366 version that uses Westmere Xeons [ http://www.tyan.com/Barebones_FT72B7015_B7015F72V2R ]. The Tyan system is a good way to go if one has the money and just wants to buy the system then just populate it with up to eight GPUs, without having to worry about risers, splitters, etc., all for what I consider to be a reasonable price.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Tutor,

Thank you. I've been rather looking for some kind of a workstation smaller case for a most cpu powerful motherboard. Supermicro seems to be great even at over 2k price but with PSU and all fans inside, right?
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute


BTW - This is the easiest way to go, if one has the money, and wants to install no more than 8 GPUs that will all run all-out in x16 PCIe sockets - http://www.superbiiz.com/detail.php?name=TS-B75F7V6 . I have the LGA 1366 version of this system. It's all plexed out. If GPU TDPs exceed 250W, then redundancy of third PSU begins dropping. So, I wouldn't recommend it for dual GPU proc. cards like the Titan Zs. One could use two E5-4650 v1 ES QBEDs, as referenced above [ viewtopic.php?f=40&t=43597&start=380#p264358 ], for CPUs to help reduce total cost of ownership. Watercooling would, however, pose some challenges.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

smicha wrote:Tutor,

Thank you. I've been rather looking for some kind of a workstation smaller case for a most cpu powerful motherboard. Supermicro seems to be great even at over 2k price but with PSU and all fans inside, right?
Correct.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Hello! Tom G. was able to successfully run a Baking Cam test on my 'crash' scene @3840 x 2160 for 10,000 samples without issue! :D

Tom, itou31, and I each have an Amfeltec product that connects at PCI 1x, however Tom had no freezes while itou31 and I do.
Thinking this could now have been narrowed down to a Windows-only phenomenon.
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

Notiusweb wrote:Hello! Tom G. was able to successfully run a Baking Cam test on my 'crash' scene @3840 x 2160 for 10,000 samples without issue! :D

Tom, itou31, and I each have an Amfeltec product that connects at PCI 1x, however Tom had no freezes while itou31 and I do.
Thinking this could now have been narrowed down to a Windows-only phenomenon.
I'm running w7 as well..but I'm using different product..backplate instead of splitter or cluster..

I'll swap host card from gen3 to gen2 slot on my motherboard to ese if that makes any change..
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

ok, so Yeserday I've tried to run Notius file up to 10k samples on my PC with eGPU plugged on 8x gen3,
(but since Host card is physsically smaller & does not have enough contacts it's forcesd to run 1x),

& then today I've also plugged the eBox into MccBookAir using Akitio to conect PCIe host card to Thunderbolt.

as You see from pair of images below, seems it went out just right, no hangs..- ran that multiple times..
10k no issues.JPG
10 k on mba.png
I do have other splitters as well..So I'll try once I will have some time..
Now got curious.. if that stability issue is OctaneRelated v3 related or..Amfeltec's hardware caused..
You do not have the required permissions to view the files attached to this post.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:Tutor, I'd be interested to see your rig running OR3, I don't remember if you have any cards connected PCI 1X, but a couple of us with 1X connections are getting crashes on scenes where the PC actually freezes and reboots. I have one scene in particular that I sent to developer Abstrax where I can reproduce it over and over again, even at different "time-'til-freeze" speeds, depending on resolution (higher res = faster crash). And we don't see the crash when using 16x connection. I also found that all it takes is for one card to be 1x, and I get the crash on the scenes that 'provoke' it. I was wondering if crashes occur at 1x and not 16x, would they occur with 4x and 8x. You know how I would test? I would send you the scene and you would run at 4x. :lol:
And the key thing is, it never ever occurs when using V2, or in general using the PC. SeekerFinder posted a comment in the development build that made me think that the move to CUDA 7 may be behind it, as it is a new CUDA Code for Octane.

As is written by Mother Goose:
"bits and bytes bit the GPU, that made the build go rat-tat-too...
and when coders code with new code new, the code could-go then coo-coo-coo" .
Ran it on MacOS 10, Win7 and Linux Mint 17 with x1, x4, x 8, and x16 powered and non-powered risers and on Amfeltec x4 GPU Oriented Splitters without issue. Of course, that doesn't rule out a defect (or other causation) in any particular piece piece of hardware that you're using.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Tutor, this is a Rainbow of a puzzle.

(1) I myself have now found that my USB risers, all by themselves, without the Amfeltec powered on, run V3 fine in all testing scenarios, even with my most crash-worthy scenes. Then, if I completely un-power and remove the USB risers and go strictly with the Amfeltec, it freezes. So this would point the finger at the cluster.

As such, I contacted Amfeltec, who recommended a series of de-bug tests, ultimately leading to a recommendation that I use an external screw-in grounding wire and attach it to the PSU to Amfeltec stand to PC, alluding that my PSU may be causing an issue. I tried this and it did not produce any different results, still got crashes. But, given that my system could in fact run V3 with my 16x and the USB riser Z's, and all runs well with the Amfeltec on V2, it's hard to imagine my system is the fault per-say.

Then I saw this post which was too coincidental to ignore:

Re: 10 GPU open rig (inexpensive)
Postby asher » Wed Sep 09, 2015 12:58 am
The "Cluster" solution is not reliable, go for the splitter. Those cluster boards have electrical grounding problems and fail randomly. trust me i have had 4 of them!


When I asked Amfeltec, since they were asking me to use a grounding-wire, if they are aware of any issue, they never answered yes or no, but rather recommended it may be better that they test in their lab. So they are going to be testing with my Red Car crash scene.

So, this for me points to Cluster.

(2) There is another user who said:
Re: OctaneRender™ Standalone 3.00 alpha 4 [latest 3.xx]
Postby tomabobu » Tue Jan 26, 2016 4:08 am
Today I've tried to explore the baking feature in C4D and I've found that if I use any of the three 970 on my Amfeltech expansion cluster I get a system freeze for 20 seoconds, a black screen for another 10 and after that a system shutdown.
If I only use the GPUs installed in my box, everything is OK.
The scene doesn't use any textures and I've tried also to set the parallel samples to 1. Tried also to test the scene in Standalone but I get the same problem.

Re: OctaneRender™ Standalone 3.00 alpha 4 [latest 3.xx]
Postby tomabobu » Wed Jan 27, 2016 2:34 am
Just tested one of the cards from the cluster by installing it in my case and everything is ok with the baking if I don't use the cluster expansion cards.
As soon as I use even just one cards from the expansion, I got a crash. And this happens just when using the baking camera. Normal rendering works like a charm.


Points at Cluster.


(3) But, then also we have:

Re: OctaneRender™ Standalone 3.00 alpha 4 [latest 3.xx]
Postby ff7darkcloud » Fri Feb 12, 2016 10:13 pm
Also using 1x riser configuration for my 3 of my 4 cards, I am having problems too, should I use a 4x splitter from amfeltec?, Is it reliable?, still I hope this will get fixed.

Also If someone knows how to fix this Lua Script for use on Octane 3 I would greatly appreciate it.
viewtopic.php?f=73&t=41365

Thanks.


Points to "1x riser", and he even asks if he might get better results with Amfeltec.


(4) And then itou31, adds more:
Re: External Graphics Cards PC
Postby itou31 » Wed Mar 02, 2016 6:04 pm
Hi,

Ok tested with v3 alpha 6 : Freeze (1x USB3) and freeze (amfeltec).
My max resolution is near 2400 x 2400 without freeze... without playing too much with settings while rendering.


Points to a USB 3 riser and/or Amfeltec 4-way Splitter


(5) and then now, enter Tutor
Re: Best Practices For Building A Multiple GPU System
Post by Tutor » Thu Mar 03, 2016 2:36 pm

Ran it on MacOS 10, Win7 and Linux Mint 17 with x1, x4, x 8, and x16 powered and non-powered risers and on Amfeltec x4 GPU Oriented Splitters without issue. Of course, that doesn't rule out a defect (or other causation) in any particular piece piece of hardware that you're using.


(6) Honorable mention, Boris Goreta and Tom Glimpse tested V3 with their 1x connections and did not experience crash, however their devices didn't overlap with those getting a crash.
Re: OctaneRender™ Standalone 3.00 alpha 4 [latest 3.xx]
Postby BorisGoreta » Sun Feb 14, 2016 7:36 pm
Where can I download hi res baking scene to test ?

Here everything works exactly the same as in version 2 (18 GPUs) , but I didn't test this scene yet.

I don't know the makes of the PCI extensions because I inherited them with the bit coin mining cases I bought. But they are just normal cables ( the ones without the USB cable ) with extra power socket sticking out ( old power white socket with only 1 of the 4 big pins present )

Re: Best Practices For Building A Multiple GPU System
Postby glimpse » Wed Feb 17, 2016 9:03 am
ok, so Yeserday I've tried to run Notius file up to 10k samples on my PC with eGPU plugged on 8x gen3 (Amfeltec 4-way PCIe Backplane),
(but since Host card is physsically smaller & does not have enough contacts it's forcesd to run 1x),

& then today I've also plugged the eBox into MccBookAir using Akitio to conect PCIe host card to Thunderbolt.

as You see from pair of images below, seems it went out just right, no hangs..- ran that multiple times..





So, we have:
2 users claiming Amfeltec Expansion Cluster issue,
2 users claiming USB Riser issue, but then I can run with USB Riser without crashes
1 user (itou31) claiming Amfeltec 4-way splitter issue, but then Tutor can run V3 without crashes

What the heck!

Or, as they say in Canada:
"Whoot the Fook"
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
itou31
Licensed Customer
Posts: 377
Joined: Tue Jan 22, 2013 8:43 am

Hi,

We do not have the same scene, same resolution, and same settings. Send me your red car also.
But we know that v2 never freeze. V3 adds more transfer to the system, perhaps something to dig on the 1X PCie V2 or V1 or V3. My splitter 1 to 4 seems to be Gen 1 (bandwith of 2.5 GBits/s). The splitter 1 to 3 is specifed at Gen 2.2 (5 Gbit/sec)). and the cluster is Gen 2.2 (5 Gbit/sec).
I7-3930K 64Go RAM Win8.1pro , main 3 titans + 780Ti
Xeon 2696V3 64Go RAM Win8.1/win10/win7, 2x 1080Ti + 3x 980Ti + 2x Titan Black
Post Reply

Return to “Off Topic Forum”