Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

glimpse wrote: ... .

noticed this "nonsense" I mean their GPU Cluster can hook up to 16 cards (in four stand configuration with proper host card), but why on earth they made that only x1???

That splitter is way more interesting, but again - those wires could be slightly longer..Overall nice Idea =)

I've been thinking to hook splitter on x4 slot & place cards on the backside of the case, mounted like fins (no 90 degree to the side of case, but like 30..)
Glimpse, Amfeltec responded to my request for longer splitter cables by saying that they could make me up to a 20" cable, but that as the length exceeds 12" it becomes more likely that the GPUs might not work consistently. That's what helped me to invent the following 8XGPU cage so that I can get the GPU PCIe connectors as close as possible to the PCie cards.

Waste Not; Want Not

Is what I heard as a child. Also, because I'm concerned about how waste affects our environment, I've decided to try to recycle a computer tower case that I've had since the mid-1990s to mod into my first 8 GPU external cage. I'm cheap also.

Mr. Dremel and I will be attending a mod party after daybreak and we'll be cutting up. You're all invited.

Six well spaced double-wide GPUs can be hung on the long side and 1 double wide GPU can be hung on each of the short sides at the base of the other six. The first 3 pics show where I am before getting some sleep; the next 2 pics (the last one is actually a double pic within one) show what the end result should resemble. The areas that I've whited out are those parts that we'll be removing. To tie the GPUs into my systems, I'll be using the x4 Amfeltec GPU splitters that I mentioned in my last post.

I'll also secure a mat to the bottom of each cage to help prevent carpet lint from being sucked up into the GPUs or my external PSU.

After this project is computed, I'll flip the bottom half of the case upside-down and we'll use it to build another 8 GPU external cage.

Finally, I'll recycle the case's cover to build another 8 GPU external cage. All three of these cages will be used for systems that I've built, and thus not be far off the floor.

All of my cMacPro external chassis must be taller, given the location of the PCIe bay and the 12 inch length of the splitter cables. Luckily, I have more old tall cases to recycle.


P.S. - I intend to place the splitter base card in my 4xE5-4650 Supermicros in the end PCIe slot nearest the side of the case door and cut a narrow rectangular hole through the door just large enough to get the PCIe female connectors through the hole and attached to the GPUs in the cage which will sit up against the door with the GPUs' male PCIe connectors oriented closest to the door.
You do not have the required permissions to view the files attached to this post.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

I have attended the NAB 2015 Show in Las Vegas, Nevada from April 13th to 16th. A local commented, "Las Vegas is either windy and cool or windy and warm or windy and hot." I've experienced all three of them almost every day here. But what has really blown me away is a High Density Compute Accelerator from One Stop Systems [ http://www.onestopsystems.com ]. This product is sold through http://www.maxexpansion.com . It's said that it can add 139.8 Tflops of computational power using 16 Nvidia K80 GPUs at 8.74 Tflops each. The system has :
1) Four PCIe x16 Gen3 Adapter Cards and Cables (that attach to a 3U rackmount chassis) and
2) Four Integrated Canisters - each with a front mounted intake fan to cool up to four PCIe x16 Gen3 GPUs per canister (4*4 GPUs per canister = 16 GPUs).

The system is powered by three 3,000 watt redundant PSUs.

It can be loaded with, up to, 16:
1) Teslas;
2) Grid GPUs;
3) Intel Xeon Phi Coprocessors;
4) GTX GPUs and
5) AMD GPUs.

GPUs can be mixed to the extent that they can be mixed in a single computer system if all four PCIe adapters are placed in a single system; but it also appears that the GPUs can be mixed to an even greater extent because the PCIe cards appear to be able to be connected to separate systems simultaneously.
Last edited by Tutor on Fri Apr 17, 2015 1:35 pm, edited 2 times in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Duplicate
Last edited by Tutor on Thu Apr 16, 2015 6:30 pm, edited 1 time in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

hehe =) seen those a while ago =) insane density!!!
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

glimpse wrote:hehe =) seen those a while ago =) insane density!!!
The rep stated that some recent Supermicros are the only ones with the IO space to handle these without issue.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

It appears that the number of GPUs per system is mainly dictated by the following factors:

1) system IO space size,
2) other system bios features or their equivalent,
3) PCIe implementation of the particular motherboard,
4) the particular GPU and each GPU's IO space requirements,
5) the particular software application, and
6) the OS.

And don't forget that these factors may interact in non-obvious ways with one another.

Suplementation and examples:
Factors 1, 2, 3 and 4) Amfeltec [ "The motherboard limitation is for all general purpose motherboards. Some vendors like ASUS supports maximum 7 GPUs, some can support 8. All GPUs requesting IO space in the limited low 640K RAM. The motherboard BIOS allocated IO space first for the on motherboard peripheral and then the space that left can be allocated for GPUs. To be able support 7-8 GPUs on the general purpose motherboard sometimes requested disable extra peripherals to free up more IO space for GPUs. The server type motherboards like Super Micro (for example X9DRX+-F) can support 12-13 GPUs in dual CPU configuration. It is possible because Super Micro use on motherboard peripheral that doesn’t request IO space."[Emphasis added]] and other providers of PCIe expansion products caution that exceeding 7 to 8 GPUs per systems is possible only with certain Supermicro systems because only they have the IO space sufficient to recognize GPU counts above 8 or 9. However, Tommes was able to get Octane to recognize only 7 of his 10 GPUs [ viewtopic.php?f=23&t=44209 ], despite the fact that his Windows OS system running on his Supermicro X9DRX+-F recognized all ten of his GTX 780 TIs. But, GTX 780 TIs might present a special case GPU - see discussion of Factor 4, below.

Factors 2 and 5) A) OctaneRender's license [ see, e.g., https://render.otoy.com/shop/nuke_plugin.php ] states: " A maximum of 12 GPU's may be used. You will not attempt to circumvent the physical GPU or single machine license limit, including obfuscating or impairment of the direct communication between Octane and the physical GPUs, virtualization, shimming, custom BIOS etc.; [Emphasis added]" B) Redshift3d states: "Each instance of Redshift can currently use up to 8 GPUs concurrently. To take advantage of more than 8 GPUs on a single machine, you can launch multiple instances of Redshift each rendering a different job on a different subset of available GPUs."

Factor 3) My cMP2,1s can recognize and run stably only 4 GPUs (or 4 GPU processors = 2xGTX 590) which matches the number of PCIe slots in the system.

Factor 4) Running 8 GTX 780 Ti ACX SC OCs in my Tyan steals resources necessary to recognize all of the system ram that I've installed and thus less ram, than all installed, is recognized. However, neither my 8 GTX Titans, nor my 4 Titan Z (4x2 GPU processors), rob other system resources when installed in the same system, i.e., all of my system's ram is recognized. Thus, I believe that certain GPUs consume more IO space than others [see discussion of Tommes's issue in 1, 2, 3 and 4), above.

Moreover, running 16 GTX GPUs, even with a TDP of 250w, consumes a lot of power, and depending on the application each of them could be consuming 300w or more: 16 * 250w = 4,000w; 16 * 300w = 4,800w. Thus, only a few GTX cards would be powered sufficiently by a 3,000w PSU. Very few PSU systems can handle 4,000w without special precautions/power/electrical systems being employed.

6) As to the OS, I do not now know what is the max GPU limit, if any, of OSX. I do know that Mavericks V. 9.2 has a 32 CPU core limit. Moreover, all that I've read on the best OS for large numbers of GPUs for system are:
1) Linux,
2) Windows and
3) OSX
in that order.


OctaneRender provides a cure to, at least some, missing GPUs running under Windows:
"Issue 9. Windows and the Nvidia driver see all available GPU's, but OctaneRender™ does not.

There are occasions when using more than two video cards that Windows and the Nvidia driver properly register all cards, but OctaneRender™ does not see them. This can be addressed by updating the registry. This involves adjusting critical OS files, it is not supported by the OctaneRender™ Team.

1) Start the registry editor (Start button, type "regedit" and launch it.)

2) Navigate to the following key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E968-E325-11CE-BFC1-08002BE10318}

3) You will see keys for each video card starting with "0000" and then "0001", etc.

4) Under each of the keys identified in 3 for each video card, add two dword values:
DisplayLessPolicy
LimitVideoPresentSources
and set each value to 1

5) Once these have been added to each of the video cards, shut down Regedit and then reboot.

6) OctaneRender™ should now see all video cards.
" [ http://render.otoy.com/manuals/Standalone/?page_id=62 ]
I've had to resort to this technique to get both the system and Octane to recognize my eight GTX 780 TI ACX SC OCs in my Tyan. It worked successfully [ https://render.otoy.com/octanebench/sum ... GTX+780+Ti ].

TheaRender provides a little more guidance:"- Configure Windows Watchdog (Windows only). Windows runs a service, called "watchdog" that monitors the graphic driver. If the driver does not respond within 2 seconds, it decides that there is a kind of instability so it terminates and restarts the driver process. The driver is the process responsible for handling all Presto commands to the GPU and - unfortunately - Presto is responsible for keeping the driver super-busy when there is a heavy rendering job.
You can actually configure watchdog service. And this is the recommendation to stay in the safe side, in all cases. In that situation, you can even set your device priorities to Highest (which means fastest Presto - at the expense of a less responsive graphic system).
Read here [ http://msdn.microsoft.com/en-us/library ... 85%29.aspx ] about watchdog service.
" [ https://www.thearender.com/site/index.p ... u-cpu.html ]

My current impression is that there are a number of factors at play when one tries to install a number of GPUs in a system that exceeds the number of factory based PCIe slots. I also do not know for certain whether an 8 GPU limit applies to the latest Windows OSes, but I am beginning to doubt that it's based on the OS, but rather has more to with the particular system, application and GPUs. I haven't seen any evidence of a single system running more than 8 GPUs on OctaneRender, FurryBall, TheaRender or Redshift3d. Moreover, I haven't been able to find any GPU maximums for either FurryBall (" Do I need FurryBall licence for each GPU? No, the license is per workstation - number of GPUs in computer is UNLIMITED") or TheaRender.

P.S. I opted for the Amfeltec splitters over the chassis because of the transfer speed and price of the splitters and the fact that I have some classic tower motherboard chassis that I can Dremel into GPU expansion service [ see "Waste Not - Want Not" post, above.]
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Tutor,

Your analysis are impressive.

My opinion on multi GPU rendering: shame on OTOY that a single licence cannot handle e.g., 16 GPUs over-the-network rendering (16 GPUs is theoretical limit on an 8-slot mobo for 8x dual gpu cards).
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

Thanks for information - good to read such pieces in one place. Have to say that most of that is above my understanding & thus I'm keen on learnign more 'bout that. Please share as You go & looking to what You end up with Amfeltec splitter in that custom box! =)
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

smicha wrote:Tutor,

Your analysis are impressive.

My opinion on multi GPU rendering: shame on OTOY that a single licence cannot handle e.g., 16 GPUs over-the-network rendering (16 GPUs is theoretical limit on an 8-slot mobo for 8x dual gpu cards).
" NEVER PLAN TO PUT/BUY DIFFERENT CARDS IF YOU THINK ABOUT WATERCOOLING" [per Smicha]. ALSO, NEVER PLAN TO PUT/BUY DIFFERENT CARDS IF YOU THINK ABOUT CREATING A MASSIVELY PARALLEL PROCESSING SYSTEM, UNLESS YOU ARE SURE THAT THE MIX OF CARDS WILL NOT CAUSE A RESOURCE/IO SPACE ISSUE [ http://render.otoy.com/forum/viewtopic. ... 37#p231837 ].
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Dealing with GPU IO space requirements & what to look for in a motherboard


Here's what Gordan79 [ http://forums.evga.com/Evga-SR2-with-8- ... 24-p3.aspx ], who appears to very knowledgeable on the issue of IO space requirements of Nvidia GPUs (and the EVGA-SR2 motherboard), has to say on the subject (when discussing putting 10 and 14 GPUs on the SR2):

"There should be no problem with the number of GPUs as long as you don't run out of PCIe I/O memory. In theory, the BIOS on the SR-2 should allow for up to 3GB of PCIe I/O memory, but it isn't particularly clever with how it allocates it. It doesn't get allocated anywhere nearly contiguously, and there are gaps it won't reuse, so in reality you are limited to a lot less than 3GB due to the crap BIOS. Also, as you may infer from the 3GB limit, it will only map I/O memory below the 32-bit limit, even though most GPUs advertise their required I/O memory as 64-bit capable.

A typical GeForce card requires:
1) 1x 128MB block
2) 1x 32MB block
3) 1x 16MB block
adding up to a total of 176MB. So in theory, you should be able to get up to 17 GeForce GPUs running on the SR-2. In practice due to the way the BIOS allocates the I/O memory you'll be lucky to get anywhere near that.

Tesla and Quadro cards are different in that they have much bigger I/O memory demands (you can modify the BIOS to adjust that on both GeForce and Tesla/Quadro cards), so obviously you'll get much fewer of those in without VBIOS modifications... . The only setting in the BIOS that is relevant to running many GPUs is the memory hole (which is settable to a maximum of 3GB). No other setting is relevant to running many devices with large I/O memory requirements. There is no way to influence where the BIOS maps the different I/O memory blocks, so it will either work or it won't, and if it doesn't, your only hope is to start modifying the soft straps on the VBIOS.

Those are documented here:
https://indefero.0x04.net/p/envytools/s ... straps.txt
The bits you would need to adjust are:
BAR 0: bits 17,18,19
BAR 1: bits 14,15,20,21,22
BAR 2: bit 23

To adjust them, dump the BIOS and look at the location appropriate to your GPU to get the current strap (remember to swap the byte order) and modify it accordingly. Flash the new strap to the GPU with nvdlash --straps. Getting it wrong will in many cases result in a bricked GPU and you will need to unbrick it by booting it with the BIOS chip disabled, then re-flash it. If you are planning to experiment with this I highly recommend soldering a switch and a resistor (in series) across the VCC and GND for easy unbricking.
... .

IMO, given what you are spending on GPUs already, you would do well to invest into a decent motherboard first. Something with a UEFI firmware would be a good start.

Given you are trying to use up to 14 GPUs (7 slots with dual GPU cards), you will need at least 2688MB of PCI IOMEM area. Although the SR-2 can theoretically deliver up to 3GB, I would be very surprised if the BIOS' IOMEM mapping had an allocation occupancy good enough to give you enough. Also note that SR-2 uses a legacy BIOS, not an UEFI one, so all BARs, regardless of whether they are 64-bit capable, have to get mapped under the 4GB limit (hence the 3GB IOMEM limitation)." [Emphasis added]

Obviously, Gordan, as well as many others, is not a big fan of the SR-2's bios implementation.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
Post Reply

Return to “Off Topic Forum”