Page 1 of 1

6 GPU system some GPUs fail(software) mid render...

Posted: Sun Dec 20, 2015 5:51 pm
by ptunstall
So I've got a 6 GPU system. It has a 980, an original Titan, two 680s and two 460s.

I originally had it setup in my office where all my other regular render boxes are but after rendering for long periods of time it would heat up the room too much even though the window was fully open to allow the cold air in AND a box fan blowing this cold an onto it. The main problem with it being setup in the office was that it would trip my power breaker... other than these two issues it ran fine and there were no GPU failures.

I had a render that was going for about 23 hours when this happened the other night so I opted to move it downstairs and run a 50 ft network cable to it.

Now it is all set back up but in a much better configuration for cooling.

Here's how its setup:

I have the 980 and the titan running directly from the motherboard and from the main motherboard's power supply. I have the two 680s being fed power from the main motherboard power supply but they are connected to an Amfeltec GPU splitter. The 460s and the Amfeltec GPU splitter PCIe card splitter cards are all powered by a 2nd power supply. I am triggering this second power supply on via a relay that is triggered on when it sees a 12v source from the main motherboard power supply. THE ONLY difference between how its wired up now and how it was wired up before is that its now on a 50ft network cable and 1 of the 680s WAS running off the main motherboard PCIe but I've moved it to run on the Amfeltec GPU splitter. When I try to power the Amfeltec GPU splitter via the main motherboard power supply, the whole system kills randomly even when idling.

Now after moving it down stairs I'll go ahead and run a render, I can confirm this happening both in direct lighting and in PMC, messages will pop up on the host machine that says like ".3 samples were lost" and on the server machine it'll tell me "cuda device (#) failed". If I restart the octane server the # of GPUs is lessened by the amout of CUDA device(#) failure messages and I can't get all of them back online till I reboot...

System Specs:

ASUS SABERTOOTH 990FX R2.0
AMD 9590 FX
32 Gigs of RAM
PSU 1: EVGA 120-G2-1300-XR 80 PLUS
PSU 2: Corsair HX850
GTX 980
Titan
Amfeltec GPU-Oriented x4 PCIe 4-Way Splitter
GTX 680 x 2
GTX 460 x 2
Windows 7

Re: 6 GPU system some GPUs fail(software) mid render...

Posted: Sun Dec 20, 2015 10:11 pm
by ptunstall
SOLVED IT! So by changing the way I was hooking up the GPUs and moving one of the 680s to the Amfeltec I took out a common ground between the two PSUs. I rewired up the PSUs so that there would be common ground between them and now there's no weirdness. I discovered this when I was touching certain parts of the system while it was running. I gently brushed against one of the ribbon cables and poof the PSU shut off and so I though to myself wow either that's a loose connection or there's a lot of interference coming from my hand which means there's no common ground.

Make sure you hook up a common ground when using multiple PSUs kids!