Best Practices For Building A Multiple GPU System

Fri Mar 25, 2016 5:51 am

Notiusweb wrote: ... .
I've seen speculative web-articles that the new NVidia GPU ("1080", "Titan Y"?) may have 16GB VRAM (some sites even claim 32GB VRAM) and I've seen as high as 6,144 CUDA cores.
We'll find out April 5th. ... .

More likely tho', earliest hard announcement for (what would be in effect) Titan Y or 1080 Ti will be close to this year's end after move from DDR to HBM2. Regular 1080 might come Junish.

Fri Mar 25, 2016 11:04 am

Are the new cards going to have different shaped inputs, it sounds like we may be on a similar verge to going from PCI to PCIe. This seems to necessitate a whole new series of motherboards.

X11DRX?...

Fri Mar 25, 2016 6:39 pm

Notiusweb wrote:Are the new cards going to have different shaped inputs, it sounds like we may be on a similar verge to going from PCI to PCIe. This seems to necessitate a whole new series of motherboards.

X11DRX?...

My following quesstimates are based on my 30+ years of closely following computer technology, factoring in that technologies will change faster and faster over time. When it comes to technology, the idea of future-proof purchases is a pipe-dream. But since I do strive for future-resistent purchases, I try to continually educate myself to anticipate what's coming around the bend.

For the next few years, most newer gamer cards */ will likely accommodate our slots, i.e., the inputs will be the same. There are too many such machines in place for manufacturers to expect immediate wholesale replacement of all of them. I believe that the bets on the current gamer card form factor may be off for the decade of the 2020's (Nvidia and other GPU manufacturers can say, "We warned you back in 2015-2016 that change was afoot."). Another factor that will accompany the change likely will be the abandonment of the current CPU silicon die architecture, because CPU die shrinkages will soon hit a wall after 7nm [ https://en.wikipedia.org/wiki/7_nanometer ] (Intel, IBM and other CPU manufacturers can say, "We warned you back in 2015-2016 that change was afoot."). If past strategies/responses have any life left, in the meantime, it wouldn't surprise me if either or both (1) innovations are made/released to allow current systems to use innovations such as NVLINK in current systems with in-case and/or ex-case add-ons/hacks**/ and/or (2) newer gamer level motherboards are released to realize Jen-Hsun's mention that gamer users will be able to use 8 GPUs in their PCs for gaming and professional purposes [without such users having to undergo the hacks that we're becoming accustomed to making] . Thus, I foresee a period of a few years where there are a mixed of architectures. In fact it wouldn't surprise me to see motherboards [that begin to accommodate some of these newer technologies] drop within the next couple of years and to see "Nvidia" or the names of one or more of its partners emblazoned thereon. As newer CPU, newer memory, newer storage and newer GPU technologies and fabrication/manufacturing processes are ironed out and mature, their byproducts will drop down over time from the so-called professional/big-business levels [where there are fewer, but larger quantity purchasers] to take hold fully at the gamer level [where there are vastly more, but smaller quantity purchasers], and for less well-heeled purchasers that by-product of greed is welcomed.

Regarding whether the presumed next X--DRX {what you call the X11DRX} (which I suppose will drop after the Intel's next CPU tock***/) accommodates NVLINK natively, I sort of doubt it. It's not that Supermicro couldn't do it, but it would represent a dramatic break from Supermicro's past behavior. The one thing that I have noticed is that Supermicro is slower to adopt newer technologies for it's mid- to low-end offerings than most other comparable sources because Supermicro usually waits for a technology to mature before committing to it's adoption in what I would call mid- to low-end systems. To my thinking the X--DRX falls in the mid- to low-end of Supermicro's offerings [ the X9-DRX board now cost about $450 and the X10 costs about $200 more {both of which probably costs no more that your gamer motherboard did}]. Moreover, most likely Supermicro didn't intend for the X--DRX to be used for GPU heavy systems [ that's why it doesn't have a telltale double slot layout like the SuperX10DRG-OT+-CPU-MNL-1780 that I mentioned earlier {see P.S. No. 3 in my last post on page 43 [ viewtopic.php?f=40&t=43597&p=268417#p268417 ]}, although the X(9)(10)DRXs are being used for that purpose, i.e., the poor person's GPU centric system. So, it would surprise me if the X11DRX came with NVLINK anytime soon. NVLINK will likely be introduced first by Supermicro in systems that Supermicro intended to be used as GPU centric systems - like the SuperX10DRG-OT+-CPU-MNL-1780.

**/ Just so there's no confusion or misunderstandings, here, as elsewhere, in any forum in which I've posted or will post, I use the term "gamer cards" not disparagingly, but simple to denote what are now called GTX GPUs that are more reasonable priced and best suited for those engaged in GPU rendering of visual arts.
**/ There may likely be some performance decrease when compared to the real thing, but, as in the past, it may be better than no hack at all.
***/ Also, keep in mind that NVLINK has a CPU link component [ https://www.youtube.com/watch?v=RBf8FLS6q8E ], so to be fully realized ( and so far only IBM and possibly ARM meets the bill), Supermicro (and any other manufacturer) would have to adopt IBM or ARM compliant CPUs or Intel would also have to release a compliant CPU. But there might be a scaled down NVLINK minus the high speed CPU to/from GPU transfers.

Thu Mar 31, 2016 7:51 pm

Tutor, what is your current GPU lineup right now?

I am thinking the new Pascal card won't really be of too much difference render-wise for me, given that my Zs cap me at 6GB VRAM, unless it sports a heavy duty core clock increase over the Titan X.
Many times I run solo with just the primary GPU, so if only 16GB Vram, I'm not too excited....
However, if it is 32GB VRAM (like Tom Glimpse says it will be), then I am interested....
Or am I?....

32GBVRAM.jpg

Fri Apr 01, 2016 11:35 pm

Notiusweb wrote:Tutor, what is your current GPU lineup right now?

I currently use in my 3d GPU CUDA rendering systems GTX (1) 480s (1.5G), (2) 580s (3G), (3) 590s (1.5G/GPU), (4) 680s (4Gs), (5) 690 (2G/GPU) (only 1 of these), (6) 780 6Gs, (7) 780 Ti (3Gs), (8) Titans (6Gs), (9) Titan Black (6Gs) (only 1 of these), and (10) Titan Zs (6Gs). As indicated in my signature, together they all provide the render equivalent (RE) of a little more than 86 GTX 980s or the Maxwell equivalent (ME) of 180,000+ CUDA cores. I use (1) the cards with less than 3Gs for bucket rendering using Thea Render or for smaller format renders using other software and (2) the GT 640 4Gs for interactivity - I only use them for final 3d renders only in special situations. I use my 13,000+ AMD/ATI GPU processing units for OpenCL 3d/video rendering. I have 206+ CPU cores for CPU 3d rendering. I sat out on the Maxwells, somewhat patiently, awaiting 6K+ cored CUDA cards, which seem to be near at hand w/Pascal.

Notiusweb wrote: I am thinking the new Pascal card won't really be of too much difference render-wise for me, given that my Zs cap me at 6GB VRAM, unless it sports a heavy duty core clock increase over the Titan X.
Many times I run solo with just the primary GPU, so if only 16GB Vram, I'm not too excited....
However, if it is 32GB VRAM (like Tom Glimpse says it will be), then I am interested....
Or am I?....
32GBVRAM.jpg

The top of the line GTX Pascals will likely be, at least, twice as fast at 3d rendering (after the software is properly tuned to run Pascals, if necessary) as a GTX Titan X (if the CUDA core count is doubled as appears to be the prevalent rumor) and the top of the line GTX Pascal is rumored to have 16Gs of ram. From what I've read about the Pascals, the 32G models will be Tesla or higher priced. Those prices will gag most of us; but I'm sure that you've already seen the big picture, literally [ download/file.php?id=50947&mode=view ].

Sat Apr 02, 2016 2:51 pm

Tutor, you weren't specifying # within the parenthesis, right?
Ie with "(10) Titan Zs", you just meant you have some number of Tian Zs > 1, not that you have 10 Titan Zs, correct?

I was adding 1+2+3+4+5+6+7+8+9+10 = 55, and I was like,
"Wooihaahhaaaasooopiiooiiiiiooooo!!!" (Elven for "Holy Smacktoids...")

Do you separate your cards in groups by VRAM, as to not cap let's say a 6GB with a 1.5GB card?
Or is it just a clock arrangement to get each rig group at a certain speed.

Sun Apr 03, 2016 2:03 am

Notiusweb wrote:Tutor, you weren't specifying # within the parenthesis, right?
Ie with "(10) Titan Zs", you just meant you have some number of Tian Zs > 1, not that you have 10 Titan Zs, correct?

Correct - I was just numbering the different models of GPUs.

Notiusweb wrote:Do you separate your cards in groups by VRAM, as to not cap let's say a 6GB with a 1.5GB card?

My usual preference is to match VRAM, speed and model; but if I need more GPU power in one or more particular system(s) for a particular job, I don't hesitate to mix and match to suit the particular need.

Sun Apr 03, 2016 3:00 pm

My rig has 1x (3072) + 6Z (5760*6=34,560), for a total 37632 CUDA.
If I take 37,632/13 cores to give me 2894 per core, and then I take 180,000/2894, that gives me 62 cores. Given you have some dual-core cards in there, that 55 number isn't that far off, is it?

Ergo,
Wooihaahhaaaasooopiiooiiiiiooooo!!! (Holy Smacktoids!)

Sun Apr 03, 2016 6:19 pm

Notiusweb wrote:My rig has 1x (3072) + 6Z (5760*6=34,560), for a total 37632 CUDA.
If I take 37,632/13 cores to give me 2894 per core, and then I take 180,000/2894, that gives me 62 cores. Given you have some dual-core cards in there, that 55 number isn't that far off, is it?

Ergo,
Wooihaahhaaaasooopiiooiiiiiooooo!!! (Holy Smacktoids!)

I have a 12 ATI cards (dual processor and single processor cards) dedicated to OpenCL chores. My 24 systems consist of 17 CUDA capable systems, 3 OpenCL only (i.e., ATI - no CUDA GPU processors therein) systems and 4 systems (with old ATI cards) dedicated solely to CPU rendering. Excluding my seven GT 640 4Gs, the number of CUDA cards included in that list of ten models {in my earlier post} is 80 [23 of which are GTX 590s (dual processors), 6 of which are Titan Zs (dual processors) and one of which is the GTX 690 (dual processor)]. So 30 of the 80 CUDA cards are dual GPU processor cards and thus the current total number of CUDA GPU processors that I own and use for 3d rendering is 110 [80 + 23 + 6 +1], unless I also use the GT 640s 4Gs in final renders (their CUDA rendering prowess is meager). My two GTX 295s (dual processor cards) are used solely for AE (older version) rendering. Of course, I can and do use CPU rendering on all of my systems (4-core to 32-core/single, dual and quad CPU systems) and sometimes simultaneously while performing GPU rendering. Fortunately, some of my software facilitates hybrid rendering (GPU and CPU rendering of same frame) [ https://www.thearender.com/site/index.p ... u-cpu.html ] and some of my software has very flexible, low-cost, emergency needs-based acquisition options [ http://furryball.aaa-studio.eu/news/fur ... edits.html ], in addition to not having any GPU processor per license cap.

Mon Apr 04, 2016 4:56 pm

You can network 110 GPU for 180K CUDA on Furryball?...LOL
Imagine that in one rig, all watercooled.
do they (F-Ball) have some sort of .OCS / .ORBX -like export so you can get a project from main PC to all render PCs? Or does each PC have its own independent creation and render workflow?

Hopefully with GPU speed increases we'll hit a real time snap render type thing, like 4K for 16000 samples in under a second per frame for big rigs like yours, and mini rigs like mine.