Best Practices For Building A Multiple GPU System

Thu Mar 24, 2016 6:55 am

Guys,

I am starting 7x 980ti on water on Asus X99 E WS 3.1 with 5930k (I'll try to reach 1000 score). Is there anything else apart from enabling 4G decoding I should pay attention to? Any important remarks?

Thank you.

Thu Mar 24, 2016 10:45 am

smicha wrote:Guys,

I am starting 7x 980ti on water on Asus X99 E WS 3.1 with 5930k (I'll try to reach 1000 score). Is there anything else apart from enabling 4G decoding I should pay attention to? Any important remarks?

Thank you.

In my case I was able to get 10 gpu without needing to enable 4g decoding, and I never found I got any performance boost from a setting in BIOS. It was needed only to push me past the board's own default setting 'limit'. So if I had 7 gpu, whether I enabled 4g decoding didn't matter. Just my 0.125 cents.

Thu Mar 24, 2016 10:46 am

Thank you, Notiusweb.

Thu Mar 24, 2016 4:25 pm

smicha wrote:Guys,

I am starting 7x 980ti on water on Asus X99 E WS 3.1 with 5930k (I'll try to reach 1000 score). Is there anything else apart from enabling 4G decoding I should pay attention to? Any important remarks?

Thank you.

I'm sure you'll be paying attention to these two other most important variables, but I'll mention them for anyone else who may be new to this endeavor : (1) everything having to do with powering the system and GPUs: adequate (TDP x 1.3) , sustained/clean/safe power (such as not overloading circuit, safe wiring, circuit demand monitoring/breakers) and (2) everything having to do with cooling (such as GPU backplates, adequate GPU spacing, direct GPU & system & room cooling).

Notiusweb wrote:... .

In my case I was able to get 10 gpu without needing to enable 4g decoding, and I never found I got any performance boost from a setting in BIOS. It was needed only to push me past the board's own default setting 'limit'. So if I had 7 gpu, whether I enabled 4g decoding didn't matter. Just my 0.125 cents.

Enabling 4G decoding enables a system to have more IO space to recognize more peripherals such as, but certainly not just limited to, GPUs. However, keep in mind that different motherboards may have different chips and support different peripherals (with different IO space needs) and different OSes support different IO space needs, and even different GPUs have different IO space requirements. For example, just one difference is that Linux tends to support more GPUs than does Windows and another difference is that newer GPUs may have different IO space needs for newer chipsets/newer peripheral support and newer GPUs tend to support more GPU memory. Thus, my GTX 590s (1.5G/GPU processor) require less IO space than a GTX Titan Z (6G/GPU processor) and a GTX 780TI requires a lot more IO space than a regular GTX 780. Since we likely have different motherboards and different GPUs this makes it difficult to say at what point enabling 4G decoding becomes mandatory for any particular user - since enabling 4G decoding only (to a point - strongly influence by OS) provides for more IO space for system components. Those mores are highly system individualistic/although somewhat relatively. Thus, I always enable 4g decoding - it's no cost and doing so means, as Forrest Gump would put it, "It's just one less thing that you have to worry about."

Thu Mar 24, 2016 4:48 pm

Notiusweb wrote:... .
***Bonus Question***
(a) If Tutor has the Amfeltec 4-way GPU Splitter going to mobo,
(b) and that splitter connects to the mobo with a 4x male connector into his board's 8x female connector,
(c) lets say using a total of 4 GPU on the splitter),

that means that each card potentially runs at a max bandwidth of:
(1) PCIE 4x (each GPU can run simultaneously at the max connection size of 4x), or
(2) PCIE 1x (each GPU runs at 4x/# GPU, or 4/4 = 1)

Car-rash.png

... .
... .[/quote]
UPDATE
Amfeltec's tech support says that both 4-way and 3-way splitters connect to GPUs at the same transfer rate of x1/GPU (newer 3-way x1 splitter has a faster chip so it too is x1 rate).

Thu Mar 24, 2016 6:35 pm

Going for 1,000 with 7 - 980 Ti's, that would be ~142 for each GPU core.
I see one with total 1017 for 8 = ~127 per GPU core, and the max scre recorded for 1 980 Ti GPU is 142 !
That would be impressive...
Looks like highest known per-GPU score is Titan x with 149...

I've seen speculative web-articles that the new NVidia GPU ("1080", "Titan Y"?) may have 16GB VRAM (some sites even claim 32GB VRAM) and I've seen as high as 6,144 CUDA cores.
We'll find out April 5th.

12 x 6,144 = 73,728 CUDA
FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU..........................................................N!*

*'CK' may be substituted for 'N'

Thu Mar 24, 2016 7:50 pm

Notiusweb wrote:Going for 1,000 with 7 - 980 Ti's, that would be ~142 for each GPU core.
I see one with total 1017 for 8 = ~127 per GPU core, and the max scre recorded for 1 980 Ti GPU is 142 !
That would be impressive...
Looks like highest known per-GPU score is Titan x with 149...

ou yeah, it will be awesome build!

Notiusweb wrote: I've seen speculative web-articles that the new NVidia GPU ("1080", "Titan Y"?) may have 16GB VRAM (some sites even claim 32GB VRAM) and I've seen as high as 6,144 CUDA cores.
We'll find out April 5th.

Stop reading sh)*&.. That high end card is M6000 silently updated by Nvidia from 12 to 24GB =)..http://anandtech.com/show/10179/nvidia- ... adro-m6000 Do You really think it would make any sense for Nvidia to realease gaming gear with 32GB? & with HBM, that it's on shortage now..- there little no chance for such stunt.. & no economical benefit for nvidia..

Don't set You expectations too high or You'll be very unhappy after GTC =)

Thu Mar 24, 2016 8:05 pm

I am sure we will find Pascal Gaming cards with 32GM but these will be the ones released towards the end of the cycle - It is a cow milking business.

Thu Mar 24, 2016 8:49 pm

Stop reading sh)*&..

I can't help it

Don't set You expectations too high or You'll be very unhappy after GTC =)

They said last year 10x faster than Maxwell...that means a core clock of about 10,000...right? RIGHT?

Post by Refracty » Thu Mar 24, 2016 3:05 pm
I am sure we will find Pascal Gaming cards with 32GM but these will be the ones released towards the end of the cycle - It is a cow milking business.

Yeah, there are gamers who go for Quad SLI Titan X, with multi-screen 4K displays..... put us render crowd to shame sometimes. I could see them making an argument why a 32GB card would be valuable to gamers as well.

Thu Mar 24, 2016 10:36 pm

Regarding the 32G cards with10x performance of Maxwell (of which 50% of the 10x is attributable to NVLINK), I tend to believe the article written long ago (8 months in tech time) by WCCF Tech (http://wccftech.com/billion-transistors ... s-in-2016/ ) and that the 10x performance increase will come only to systems (new [or newest] ones and not likely to most of us seeking to add to our existing PCI-e styled systems) that are capable of supporting a modified PCIe architecture:

"The Pascal GPU would also introduce NVLINK [ see, e.g. https://www.youtube.com/watch?v=RBf8FLS6q8E & https://www.youtube.com/watch?v=gFLUgAi9g50 ] which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face. One of the latest things we learned about NVLINK is that it will allow several GPUs to be connected in parallel, whether in SLI for gaming or for professional usage. Jen-Hsun specifically mentioned that instead of 4 cards, users will be able to use 8 GPUs in their PCs for gaming and professional purposes.

With Pascal GPU, NVIDIA will return to the HPC market with new Tesla products. Maxwell, although great in all regards was deprived of necessary FP64 hardware and focused only on FP32 performance. This meant that the chip was going to stay away from HPC markets while NVIDIA offered their year old Kepler based cards as the only Tesla based options. Pascal will not only improve FP64 performance but also feature mixed precision that allows NVIDIA cards to compute at 16-bit at double the accuracy of FP32. This means that the cards will enable three tiers of compute at FP16, FP32 and FP64. NVIDIA’s far future Volta GPU will further leverage the compute architecture as it is already planned to be part of the SUMMIT and Sierra super computers that feature over 150 PetaFlops of compute performance and launch in 2017 which indicates the launch of Volta just a year after Pascal for the HPC market."

I don't believe that Jen-Hsun had current gamer PCs (and most current Supermicro, Tyans or other similar bigger-iron) in mind, unless there'll be a chassis add-on for current systems that allows easy support for 8 GPUs - currently something which most of us with 8 or more GPUs per system have had to cobble together with risers, splitters and the like. Moreover, NVLINK's initial release appears to be IBM Power (and maybe ARM-64) processor centric. So a lot depends on how NVLINK (which is responsible for one-half of that 10x performance increase) is implemented. In sum, desire for research/manufacturing cost recovery and manufacturer greed may relegate the highest performance to computer systems not yet released and whose cost will be extremely high. Thus, I'm casting my prediction closer to Refracty's and Glimpse's - max performance will likely be had only by those with deep (but fully filled) pockets. In any event, I'm not being a sour puss because I've never seen before even a 2x increase per GPU processor in rendering speed and that appears to be soon clearly attainable even on our present hardware.

P.S.
(1) There could be external box add-ons, but to my knowledge the prices of those too, in the past, have caused nose bleeds.
(2) Just as I have urged caution/care in spending a lot on Maxwells with Pascals coming around the corner, I'd also urge caution/care in high end spending to buy any current system until we know how NVLINK will be hardware implemented. So get a current system only if you really need it before then, but if you can wait you might be better off. Newer systems might make issues we've previously discussed, in detail, (like those involving Octane 3 changes from v2) relics of past systems. However, I also recognize that there'll certainly be new ones to discuss.
(3) If you need a high end multi-GPU rendering system before (or even after) Pascal NVLINK hardware is released, you may want to consider, among others, a SuperX10DRG-OT+-CPU-MNL-1780 [ http://www.supermicro.com/products/syst ... GR-TRT.cfm ] ($3250 @SabrePC) + 2 x E5-2620 v3s for ~ $400 each from Superbiiz [ http://www.superbiiz.com/detail.php?name=E52620V3BX ]. The SuperX10DRG-OT+-CPU has 8 PCI-E 3.0 x16 (double-width) slots plus 2 PCI-E 3.0 x8 (in double-width x16 size) slots. You'll likely need to purchase storage, ram and additional PSU(s) and GPUs + you'll have to install the ram, storage, CPUs, GPUs and connect the external PSUs to GPUs.
(4) I'll be regularly using - for Google searches - phrases such as " NVLINK capable motherboard and NVLINK GPU enclosure/box/case."