Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:Tutor et al,

What are your thoughts on SSD being used in available PCIE express lanes as far as Best Practices for a multi GPU rig, ...


http://computer.howstuffworks.com/pci-express1.htm states:
“Each lane of a PCI Express connection contains two pairs of wires -- one [pair] to send and one [pair] to receive. Packets of data move across the lane at a rate of one bit per cycle. A x1 connection, the smallest PCIe connection, has one lane made up of four wires. It carries one bit per cycle in each direction. A x2 link contains eight wires and transmits two bits at once, a x4 link transmits four bits, and so on. Other configurations are x12, x16 and x32.”

I liken PCI-e lanes to the modern expressways. When traveling the expressway, I frequently see large, medium and small trucks, luxury cars, sports cars, large family cars, small cars, minivans and motorcycles. “Peripheral devices that use PCIe for data transfer include graphics adapter cards, network interface cards (NICs), storage accelerator devices and other high-performance peripherals.” [ http://searchdatacenter.techtarget.com/ ... CI-Express ] Those high-performance peripherals can come in many forms and serve many purposes.

CPUs determine how many PCI-e express lanes might be supported and the more CPUs (usually the higher priced ones) per system, the more PCI-e lanes that can be supported. Intel labels its Xeon processors such that one can tell the number of supported CPUs that can co-exist by the first number after E5. In many ways, my E5-4650s V1s have virtually the same specs [ http://ark.intel.com/products/75289/Int ... e-2_40-GHz ] as the E5-2680 V1s [ http://ark.intel.com/products/75277/Int ... e-2_80-GHz ] that supports Max # of PCI Express Lanes = 40. At best, for the higher priced and larger number/name denominated CPUs - support up to 40 lanes has been the max from V1 to V5 of E5s. The number of CPUs that can co-exist together do differ. */ For example, systems that support E5-4600s can support up to four CPUs. Thus, four E5-4650s can support a Max # of PCI-e lanes of 160 (4x40). Systems that support the E5-2800s can support up to two high end CPUs and thus can provide up to 80 lanes (2x40). Thus on the one hand, if one were inclined to have PCI-e lanes sufficient to support more GPUs and other peripherals that use PCI express lanes one should look into acquiring a system motherboard that supports more CPUs. Moreover, the kind of connection one uses can affect PCIe lane availability. For example, using certain splitter cards (like Amfeltec’s X4 Splitter card) and certain riser cables (like x1 or x4 or even x8) can help to reduce PCI-e lane needs. Moreover, there’s a caution in that motherboard manufacturers seemingly abhor free/unused PCI-e lanes and thus tend to add more functionality that relies upon and uses PCI-e express lanes for data transmission. Don’t forget that motherboard designers/manufactures have very important roles to play. Even USB & SATA data can travel down PCI-e lanes even though you don’t see a USB or SATA card occupying one of your system’s PCI-e slots. Additionally, game systems aren’t known for being populated with Xeon CPUs and there’s no legal requirement that any manufacturer make any of its motherboards support the max lane capability of the highest end CPUs. There’s bound to be situations where someone buys a 40-lane supported CPU/ or two or four of them, along with the appropriate motherboard, and yet cannot take full advantage of it/them on a particular motherboard because that functionally hasn’t been implemented fully. So, pre-purchase investigation is required and compromises are likely. In the end, I fall back on an observation that I made earlier - judge the capability of a motherboard’s potential to satisfy a user’s GPU (and with a twist for your special question - SSD) needs by the number and “X” designation of the PCI-e slots that are visible. To be sure, one may be able to satisfactory run more GPUs and other PCI-e based cards than there are open slots on the motherboard, but the question of "how many more" is left to ingenuity and a lot of luck. That’s why the three motherboards that I last purchased each have eleven X8 sized slots, with one of the eleven being only X4 electrically. How I have and will populate those slots (and there’ll always be at least one SSD card in one or two of them) will greatly depend on my ingenuity and luck or as Seekerfinder says,"... it's trial and error." However, to reduce error, I'd recommend that a purchaser who's following our path just get the motherboard that has the greater number of visible PCI-e slots and be happy with any surplus GPU installations.

*/ Likewise, the E5-1600’s CPUs support only one CPU per system. [ As just one example, the E5-1680 V3 supports up to 40 lanes - http://ark.intel.com/products/82767/Int ... e-3_20-GHz ]
Notiusweb wrote: ... as in does it work well, or is it a burden on the rig's ability to perform GPU functions. Or if you tried, what have you found works well or what doesn't work well?
For now, I'm using one of my Supermicro X9-DRXs (11 PCI-e slotted). It works as well as I had/have expected it to work and I haven't found it to be any noticeable burden on the rig's ability to perform GPU functions. Since I'm doing animations, an SSD is essential to getting the smooth playback speeds, especially for large format projects, that I need; although in the near future, I might also dedicate one of my non-GPU-rendering systems mainly to final animation review and thus install one or more additional SSDs in it.
glimpse wrote:Guys, I'm wondering, how many GPUs You managed to plug most on single PSU? (yeah, I'm aware of Watage), but.. what is the highest number of GPUs You manage to connect? =)


See below.
glimpse wrote:
itou31 wrote:on my side : 5 (2 Titan black and 3 780Ti) on a 1600W PSU LEPA
Ghm, noOne tried to plug more? 1600W PSU seems to be an overkill for 5GPUs sipping under 1000W in total. I've heard some stability issues with more than 5GPUs & I'm curious if that has anything to dow with reality..
For my systems (many of which have dual GPU processor cards), it depends on various factors such as what else the system has to power, the number of processors on the cards, the type of rendering one does ( I use GPU-only rendering on some projects and GPU/CPU hybrid rendering on others and even simultaneous GPU and CPU rendering using different renderers depending on project needs) and whether one overclocks and the degree of thereof. So I'd suggest that one look closely at the wattage needs for their particular usage, one's total system's needs and the particularities of the GPUs that one may own. But in general, my experience has been that five GPU processors per one 1600W PSU is tops for my usages.
Last edited by Tutor on Mon Apr 18, 2016 6:42 pm, edited 1 time in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

thanks for info Tutor. I've noticed several accurances where You can't get stability out of the system with more than 5GPUs (even on PSUs that are 1600W+)..so in the end it seems it is not a considence..will have to dig deeper, but it seems this topic is not so widelly covered. It's probably spikes & reliable voltage, that in the end cause most of problems.. Thanks again.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

glimpse wrote:thanks for info Tutor. I've noticed several accurances where You can't get stability out of the system with more than 5GPUs (even on PSUs that are 1600W+)..so in the end it seems it is not a considence..will have to dig deeper, but it seems this topic is not so widelly covered. It's probably spikes & reliable voltage, that in the end cause most of problems.. Thanks again.
I believe that you've just hit the nail/spike dead on its head. Voltage reliability also matters much. Spiking isn't a rare occurrence (so says my Kill-A-Watt meter) and can cause various anomalies. So can transient voltages. We're managing lots of factors on our own in blazing our trails using various GPUs in various systems in various scenarios, etc.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Tutor,

Thank you for helping Tom with the max gpu number per PSU question. I'll try to be very precise in describing the situation where I connected 7 GPUs and got 00 error code from asus x99e ws and never booted again:

1. After cutting all 6 dvi ports and tested gpus one by one - all worked.
2. I put the 7-way water bridge but connected to the psu only the 1st card - it booted and I could enter bios without any problem. I did leak tests and all was working fine.
3. I connected 7 gpus to the psu (5 cables with dual 2x(2+6) connectors, 2 separate cables with 2+6 and last one connected via 2x(2+6) splitter).
4. When I turned on the machine it was trying to pass tests (blue diodes on the mobo) and: ram passed, gpus passed, cpu passed, and again quick blink a the ram diode and restarted. And so on.
5. I started disconnecting all gpus one by one trying to boot it. Same poor results. When I reached only one gpu connected it also failed.
6. I flashed bios with the newest one #2006 and after this (with only one gpu connected) it gave only the 00 core on the mobo and kept restarting.
7. I disassembled entire loop, removed cpu, gpu, all gpus and flashbacked bios over 20 times - no results - only 00 code.
8. I tested the PSU on my machine and it is fine.
9. I also removed battery from the mobo, did many clear cmos, kept the mobo without any devices for long time - nothing.

I received a response from a great guy Stanley Brusse and he had had such issue over the course of a year and finally put 2x 1200W psu (with add2psu tool) and managed to fire all gpus up - with a single psu at 6th cads his machine hadn't booted -he had had to disassemble it all removing gpus and psu.

I am really surprised that asus (no knowledge if this is the case for supermicro) does not boot with a single PSU for more than 5 gpus, but require 2 of them.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Seekerfinder
Licensed Customer
Posts: 1600
Joined: Tue Jan 04, 2011 11:34 am

smicha wrote:Tutor,

Thank you for helping Tom with the max gpu number per PSU question. I'll try to be very precise in describing the situation where I connected 7 GPUs and got 00 error code from asus x99e ws and never booted again:

1. After cutting all 6 dvi ports and tested gpus one by one - all worked.
2. I put the 7-way water bridge but connected to the psu only the 1st card - it booted and I could enter bios without any problem. I did leak tests and all was working fine.
3. I connected 7 gpus to the psu (5 cables with dual 2x(2+6) connectors, 2 separate cables with 2+6 and last one connected via 2x(2+6) splitter).
4. When I turned on the machine it was trying to pass tests (blue diodes on the mobo) and: ram passed, gpus passed, cpu passed, and again quick blink a the ram diode and restarted. And so on.
5. I started disconnecting all gpus one by one trying to boot it. Same poor results. When I reached only one gpu connected it also failed.
6. I flashed bios with the newest one #2006 and after this (with only one gpu connected) it gave only the 00 core on the mobo and kept restarting.
7. I disassembled entire loop, removed cpu, gpu, all gpus and flashbacked bios over 20 times - no results - only 00 code.
8. I tested the PSU on my machine and it is fine.
9. I also removed battery from the mobo, did many clear cmos, kept the mobo without any devices for long time - nothing.

I received a response from a great guy Stanley Brusse and he had had such issue over the course of a year and finally put 2x 1200W psu (with add2psu tool) and managed to fire all gpus up - with a single psu at 6th cads his machine hadn't booted -he had had to disassemble it all removing gpus and psu.

I am really surprised that asus (no knowledge if this is the case for supermicro) does not boot with a single PSU for more than 5 gpus, but require 2 of them.
Hi Smicha,
Thanks for sharing your experience here. Which specific single PSU did you try (model & watts)? Sounds like an inrush issue and the spec / quality of the PSU would be critical here. What do you think? At 8 you said you tested the PSU but did try other units?
Seeker
Win 8(64) | P9X79-E WS | i7-3930K | 32GB | GTX Titan & GTX 780Ti | SketchUP | Revit | Beta tester for Revit & Sketchup plugins for Octane
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Seekerfinder wrote:Hi Smicha,
Thanks for sharing your experience here. Which specific single PSU did you try (model & watts)? Sounds like an inrush issue and the spec / quality of the PSU would be critical here. What do you think? At 8 you said you tested the PSU but did try other units?
Seeker

The best available 1600W PSU - EVGA T2 supernova Titanium. Today I am receiving a new mobo and if needed I get an extra PSU. I'll send more feedback soon.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Seekerfinder
Licensed Customer
Posts: 1600
Joined: Tue Jan 04, 2011 11:34 am

smicha wrote:
Seekerfinder wrote:Hi Smicha,
Thanks for sharing your experience here. Which specific single PSU did you try (model & watts)? Sounds like an inrush issue and the spec / quality of the PSU would be critical here. What do you think? At 8 you said you tested the PSU but did try other units?
Seeker

The best available 1600W PSU - EVGA T2 supernova Titanium. Today I am receiving a new mobo and if needed I get an extra PSU. I'll send more feedback soon.
That's a great PSU. It could still be inrush though. Measuring at the wall might be useful.
The dual CPU / Xeon boards seem to be more stable with 5+ GPU's. Are you getting another x99e ws or a different board this time.
Cheers,
Seeker
Win 8(64) | P9X79-E WS | i7-3930K | 32GB | GTX Titan & GTX 780Ti | SketchUP | Revit | Beta tester for Revit & Sketchup plugins for Octane
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

smicha wrote:Tutor,

Thank you for helping Tom with the max gpu number per PSU question. I'll try to be very precise in describing the situation where I connected 7 GPU and got 00 error code from asus x99e ws and never booted again:

1. After cutting all 6 dvi ports and tested gpus one by one - all worked.
2. I put the 7-way water bridge but connected to the psu only the 1st card - it booted and I could enter bios without any problem. I did leak tests and all was working fine.
3. I connected 7 gpus to the psu (5 cables with dual 2x(2+6) connectors, 2 separate cables with 2+6 and last one connected via 2x(2+6) splitter).
4. When I turned on the machine it was trying to pass tests (blue diodes on the mobo) and: ram passed, gpus passed, cpu passed, and again quick blink a the ram diode and restarted. And so on.
5. I started disconnecting all gpu one by one trying to boot it. Same poor results. When I reached only one gpu connected it also failed.
6. I flashed bios with the newest one #2006 and after this (with only one gpu connected) it gave only the 00 core on the mobo and kept restarting.
7. I disassembled entire loop, removed cpu, gpu, all gpus and flashbacked bios over 20 times - no results - only 00 code.
8. I tested the PSU on my machine and it is fine.
9. I also removed battery from the mobo, did many clear cmos, kept the mobo without any devices for long time - nothing.

I received a response from a great guy Stanley Brusse and he had had such issue over the course of a year and finally put 2x 1200W psu (with add2psu tool) and managed to fire all gpus up - with a single psu at 6th cads his machine hadn't booted -he had had to disassemble it all removing gpus and psu.

I am really surprised that asus (no knowledge of this is the case for supermicro) does not boot with a single PSU for more than 5 gpu, but require 2 of them.

Smicha,

For starters, I don't want to offend anyone with what follows. But since you discussed an experience with Asus and described an error message that still troubles me to this day and asked me about it, I'll let you know how and why I feel the way that I do. About three years ago, I purchased, not one, but two new Asus Server Z9PE-D8 WS Workstation Motherboards for GPU rendering. They were without a doubt the worst purchases that I've ever made in my life. My first Asus build handed out 00 codes so incessantly that I thought it had an affinity for James Bond, but just couldn't add a "7" after the double "0s". In other words, the bios corrupted before I every got the system running and the retailer replaced that board twice (each after double "0" incidents with the first one and its two replacements). I still hadn't opened the box containing that second Asus Server that I had purchased. So when I finally concluded that those three prior Asus boards had too much bad bios in common, and opened the second one that I had earlier purchased to see whether it would make me think otherwise, it too quickly started to remind me of the other three Asus Server motherboards that I had labored with because after booting twice, it exhibited the same crappy bios behavior. Coincidence between those four mother- - - - - - -boards {just count the dashes and use your imagination}, I think not. Coincidence with your story? I think not. I will never purchase anything with the name "Asus" on it. Let's just say, Asus drove me to being a customer of Tyan and Supermicro - both of whom I've held and still hold in great esteem. Moreover, if I place too many GPUs in either my Tyan or my five Supermicros, they'll still boot even if the GPU card count is too high and never have I experienced a corrupted bios with any of them or for that matter - with any of the gaming PCs that I've every enlisted for GPU rendering. Asus deserves and holds a one-of-a-kind place on my wall of shame and I've had four of them as proof. In sum, I am NOT really surprised at what you describe because it involves Asus and the only words of advice that I can offer are - "Never Again Asus."
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Tutor,

You read my thoughts. The first board that corrupted and even didn't give any signal after 1h of working was the one I got for Yam (4x980ti). This one gave 00 after connecting 7 gpus. Within 2 hours I am receiving the new one - and you know what - I am ------ scary to use it at all but have no choice.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Tutor,

One question: did you RMA the 00 asus boards and was given a refund ?
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
Post Reply

Return to “Off Topic Forum”