Best Practices For Building A Multiple GPU System

Mon Sep 21, 2015 12:50 am

Tutor wrote:
philliplakis wrote:https://www.dropbox.com/s/jo50xuejys25vdk/GTX%20Titan%20780%20Ti%20Sigle%20Slot%20Mod%202.0.pdf?dl=0

There is a pdf of how to remove it, It was sent to me from the guy who was selling single slot brackets just incase anyone wanted to know...

Thanks,

I'll be using this guide in the future.

I have network question hoping you could shed some light on.
I haven't networked more than 2 computers before, and i used direct ethernet between my macs which was simple enough,

My query is, can i have a network "daisy chained" for example, One port from my mac into another mac then from that mac to another.. I've attached a diagram as I'm a prob confusing...

Will they all communicate to each other?

Mon Sep 21, 2015 2:18 am

philliplakis wrote:
Tutor wrote:
philliplakis wrote:https://www.dropbox.com/s/jo50xuejys25v ... 0.pdf?dl=0

There is a pdf of how to remove it, It was sent to me from the guy who was selling single slot brackets just incase anyone wanted to know...

Thanks,

I'll be using this guide in the future.

I have network question hoping you could shed some light on.
I haven't networked more than 2 computers before, and i used direct ethernet between my macs which was simple enough,

My query is, can i have a network "daisy chained" for example, One port from my mac into another mac then from that mac to another.. I've attached a diagram as I'm a prob confusing...

Will they all communicate to each other?

Like you, I haven't networked by direct connection more than two of my MacPros. I've also done it with two of my Windows computers and, separately, with a MacPro directly connected to a Windows computer. I recommend that you go ahead and try networking all four of them (there's little to risk of loss, except for your time), using the information set forth here [ http://www.techrepublic.com/blog/apple- ... connected/ ] to resolve any issues. Please let me know whether it works and how it works best. I'm curious because I have four MacPros also [3x2007 MacPros and 1x2009->2010/12 MacPro] and intend to soon put them all, for the first time, in the same work area. Maybe I'll then give it a try.

Tue Sep 22, 2015 10:19 am

Hi,

short update, got my amfeltec gpu splitter yesterday, if i put the splitter into my supermicro board, i can use only 5 titan x cards. with the 6 attached the board does not start.

if i use 5 titanx cards into the board directly got a octanebench of 640, if i use the amfeltec splitter i have 585. the connection of the splitter is pci-e 2.0 1x, so 500mb/sec for 4x 12Gb .. takes a while. other problem if i render a framerange, only the first frame render , after the next octane crashed and create a dmp file. same on redshift. maybe its the board. today is test this with the supermirco mainboard sr-x.

knd regards
thunderbolt

Tue Sep 22, 2015 10:24 am

Is the splitter you've bought this one?
http://amfeltec.com/products/flexible-x ... -oriented/

Tue Sep 22, 2015 10:50 am

no it was this one

http://amfeltec.com/products/gpu-oriented-cluster/

setting all cards to tcc mode, redshift works now, testing with octane.

update ...

with tcc mode the 4 cards works ...

setup
main titan x wddm
other cards TCC

king regards
thunderbolt

Tue Sep 22, 2015 1:49 pm

thunderbolt1978 wrote:Hi,

if i put the splitter into my supermicro board, i can use only 5 titan x cards. with the 6 attached the board does not start.

i was having the same situation with my PC, It was the new Alienware with a 5820k i since sold it for that reason...
Couldn't boot with more than 5 cards in any configuration with the amfeltec.

hoping to build a x79 with the Asus WS mobo to see if i can get 7-9 cards.

Tue Sep 22, 2015 4:05 pm

thunderbolt1978 wrote:no it was this one

http://amfeltec.com/products/gpu-oriented-cluster/

setting all cards to tcc mode, redshift works now, testing with octane.

update ...

with tcc mode the 4 cards works ...

setup
main titan x wddm
other cards TCC

king regards
thunderbolt

Special thanks Thunderbolt1978*/ for your last post for it jogged my aged memory. I had intended, but had forgotten, to earlier post the following:

For those unfamiliar with TCC here's some additional information (I've bolded certain benefits that may be of special interest to those doing 3d rendering):

"About TCC

The TCC (Tesla Compute Cluster) driver is a Windows driver that supports CUDA C/C++ applications. The driver enables remote desktop services, and reduces the CUDA kernel launch overhead on Windows. Note that the TCC driver disables graphics on the Tesla products.

The main purpose of TCC and the Tesla products is to aid applications that use CUDA to perform simulations, and large scale calculations (especially floating-point calculations), such as image generation for professional use and scientific fields of study.

The benefits of using the Tesla Compute Cluster driver package:

TCC drivers make it possible to use NVIDIA GPUs in nodes with non‐NVIDIA integrated graphics.
NVIDIA GPUs on systems running the TCC drivers will be available via Remote Desktop, both directly and via cluster management systems that rely on Remote Desktop.
NVIDIA GPUs will be available to applications running as a Windows service (in Session 0) on systems running the TCC drivers.
The TCC driver was specifically designed to be used with Microsoft's Windows HPC Server 2008. However, NVIDIA's TCC driver can be used with operating systems other than Windows HPC Server 2008. The NVIDIA TCC driver does not have the same pinned allocation limits or memory fragmentation behavior as WDDM. You can mix TCC drivers with XP-style display drivers.

For more information about supported operating systems, and compatibility with other NVIDIA drivers, refer to the documentation on NVIDIA Tesla:
http://www.nvidia.com/object/tesla_comp ... tions.html
For more information about NVIDIA hardware compatibility on Windows HPC Server 2008, see:
http://technet.microsoft.com/en-us/libr ... S.10).aspx
To search the NVIDIA web site for Tesla drivers, see:
http://www.nvidia.com/Download/index5.aspx?ptid=7 " [ http://http.developer.nvidia.com/Parall ... luster.htm ] [Emphasis added.]

Note well: (1) The first and third bolded benefits may help increase maximum no. of useable GPUs and, Yes, it works with other non-Tesla, i.e., GTX cards.
(2) Read this, especially noting the comments: http://johanneshabich.blogspot.com/2010 ... -with.html .

*/ Thunderbolt1978, please tell us how you activated TCC mode { i.e., did you use the method set forth here: [ https://www.google.com/url?q=http://htt ... dXtwdsG99Q ] } and on what version of Windows.

Tue Sep 22, 2015 4:40 pm

Guys,

This is a gold mine!

Thank you so much for sharing your knowledge.

Tue Sep 22, 2015 4:44 pm

smicha wrote:Guys,

This is a gold mine!

Thank you so much for sharing your knowledge.

I just edited the footnote for my last post, so you may want to re-read it.

Wed Sep 23, 2015 5:00 am

Now that we've reached page 30, I think a Best Practices recap is in order for newcomers. So here it is:

09/23/2015
Top Ten Honey Bees

Honey, Be Curious
1) Try to stay abreast of the current and upcoming technological developments involving motherboards, PSUs (power supplies), CPUs (central processors) and GPUs (Graphics Processors). For example, for CPUs you can get their specs from http://ark.intel.com for details concerning matters such as the number of PCIe lanes supported. You can compare specifications of released GPUs on https://en.wikipedia.org/wiki/List_of_N ... sing_units and you get a glimpse of what GPUs are planned for the future here - http://videocardz.com and here - http://wccftech.com . You can get information from the website of the manufacturer or seller of your motherboard, including the number of PCIe lanes a motherboard supports.

Honey, Be Knowledgeable
2) Download, convert to PDF and read the manual(s) before purchasing any component, especially the motherboard, CPU(s) and GPUs, paying particular attention to (a) any schematics and diagrams for the motherboard’s layout of the PCIe slots and lanes and (b) especially all PCIe bios settings (and above all else an option to allocate more resources to PCIe devices such as “Above 4G” functionality - this functionality is very important to have).

Honey, Be Analytical
3) Assess your current (and try to assess your future) needs before you purchase any hardware. The more systems you have to use for 3d rendering the more software licensing costs rise. Keep in mind that OctaneRender’s license has a 12 GPU processor limit per licensed system. The word, “processor,” is key because because a dual processor video card such as a GTX 590, GTX 690 or GTX Titan Z counts as two GPU processors.

Honey, Be Observant
4) As to GPU potential, note the number of PCIe slots on the motherboard and don’t forget to count the smaller slots, such as those x4 and x1 slots - the more slots there are, the more likely the motherboard will support more GPUs, but that isn’t always the case; there isn’t necessarily a one-to-one correspondence between total number of card slots and the number of GPUs supported. Many times, slot number is just the minimum number of GPUs supported; then at other times, the number of GPUs slots of all sizes is more than the number of GPUs supported, particularly if you select a CPU that doesn’t support as many lanes as the motheroard supports. So you ask the question, "How do I connect more GPUs than there are slots?" my answer is "Honey, be patient and don't ignore the footnote, below."

Honey, Be For Real
5) For the near future, GPUs must be connected to a motherboard. Currently, Supermicro [ http://www.supermicro.com/index_home.cfm ] and Tyan [ http://www.tyan.com ] server motherboards support larger GPUs installations than gamer/enthusiast motherboards. Supermicro server motherboards are now the better of those two brand named motherboards in terms of supporting more GPUs, but this may not always be the case.

Honey, Be Consistent
6) Try to use all GPUs of the same type/model number. Some GPUs in one families, such as Maxwell, may have higher Input Output (IO) memory requirements than some GPUs in another family, such as Kepler. Higher IO Memory ("IOMem") requirements mean fewer successful GPU installs. This isn’t to say that you can’t mix it up; it just that mixing it up may have its own baggage which, at times, can get very heavy to carry as you add more and more GPUs.

Honey, Be Cool
7) GPUs, as well as other computer system components, emit heat. The more GPUs that you install, the more heat that will be emitted. Do not neglect to take into account the cooling needs (including peaks) of your system, including especially those of your GPUs. Water cooling is more effective, but costlier than air cooling and there’re other pros and cons such as leaks and noise level. Adequate GPU card spacing is important. Consider using backplates on your GPUs, particularly where the back of the card has memory modules, and assess the impact of backplates on GPU spacing. Also, don’t neglect environmental cooling options such as air conditioning because whether you cool GPUs by air or water, that hot air will ultimately get emitted somewhere nearby.

Honey, Be Both Safe and Powerful
8) GPUs, as well as other computer system components, consume electricity. The more GPUs that you install, the more electricity will be consumed/needed. Do not neglect to take into account the power needs (including peaks) of your system, including especially those of your GPUs. Depending how you define “massive” may like require the use of more than one power supply and another circuit into which the supplementary PSU(s) will be plugged. Although if you use only Octane and do not over clock your GPUs those power needs are easier to determine because using Octane alone will not usually require your GPUs to exceed the manufacturer’s Thermal Design Power (TDP). Thus, you need to know the TDP of your video cards. You can find the TDP here - https://en.wikipedia.org/wiki/List_of_N ... sing_units if its not otherwise available to you. If you use your GPUs for other applications or you over clock them you should try determine how that affects power requirements by using power meters. Also, assess your structure’s electrical system’s capabilities and related factors such as the need for surge protectors, ground fault protection and certain the wire gauges in electrical cables. Be aware when selecting wire gauges that lower numbers are better than higher numbers, e.g., 12 gauge provides more protection than 14 gauge.

Honey, Be Agnostic/Flexible or Receptive
9) Generally, Linux will support more GPUs than Windows. Windows will support more GPUs than MacOS and the GPUs in Windows systems can be over clocked using EVGA Precision X or MSI AfterBurner utilities. In contrast, urrently only the display card’s GPU can be over/under clocked if you’re using only Linux. If you’re using MacOS, none of the GPUs can currently be over/under clocked. Emulators may, however allow you, for example, to run Windows applications on a Linux system, but as with any emulation there are pros and cons. However, even if you’re bound to one particular OS, at least be open to (however, I recommend that you seek) recommendations from other users of that OS who do (or are able to) share optimizations.

Honey, Be Creative
10) Try to maintain a level of creativity, i.e., think outside the box and I mean that literally as applied to your system’s case. Learn as much as you can about GPU connection aids such as riser cables and cards, splitters, and external chassis*/ [see, e.g. http://amfeltec.com/# ] (and their power needs, placement restrictions/options and other limitations/advantages) and use them where you have satisfied yourself that they will aid your attainment of your system goal(s).

*/ Devices such as splitters and external chassis allow one to exhaust a system's GPU capacity when a motherboard's slots aren't enough to do so.

N.B. Honey Bees - these best practices aren’t cast in stone - they'er just intended to make honey gathering a bit easier.

P.S. See my post(s) on page 31 for more and revised "Best Practices."