Best Practices For Building A Multiple GPU System

Fri Sep 18, 2015 9:50 am

Thanks.

Another question - what fittings did you use for connecting gpus? There is so little space between them. And is there enough room between cards for extra backplates?

Fri Sep 18, 2015 11:29 am

thunderbolt1978 wrote:Hi Tutor,

i´m running a rig with 7 GTX 780 6GB, work like a charm. I used the same board with 7 titan x. I stuck at the BAR problem.

1x GTX780 6GB uses 178MB of IOMEM, Titan X use 306 MB of IOMEM. As far i can see and what gordon told is is, the 32 bit bios is the problem. Did you use uefi based bioses or did you changed the BAR strap ?

the hardest part was to mod the titan x to single slot.

kind regards
thunderbolt

OS Matters Much

Hello Thunderbolt1978,

I'm in the process of consolidating my 100+ CUDA GPU processors from twenty-four 3d rendering systems to as few systems as possible. It now appears that I'll be able to get all of those GPUs into twelve systems and that includes three MacPros (2xMacPro2,1s [2007] and 1xMacPro4,1 [2009] thats been hacked into a MacPro5,1 [2010-2012]). My Macs appear to support the least amount of GPUs of all of my systems. The other nine GPU oriented systems are 4xSuperMicros (LGA 2011 v1 and v2), 1xTyan (circa: 2009 to 2010 - LGA 1366), 2xEVGA SR-2s (LGA 1366) and 2xEVGA X79 Dark Classifieds (LGA 2011 v1 and v2). My twelve other systems will be preserved/reserved for CPU 3d rendering [and later for more GPU rendering as I move my older GPUs into them as I put Pascals, Voltas, etc. into my newer systems ]. So needless to say, I'm working with a variety of bioses, some UEFI and some archaic. I have to fit this consolidation process between my rendering jobs; so it'll be a slow, protracted process and will apparently be somewhat perpetual.

I've read your conversation [ http://forums.evga.com/Evga-SR2-with-8- ... 24-p3.aspx ] with Gordan79 on EVGA's site to give me some background info. I note that like my situation, your business uses Octane, Redshift, Blender and Furryball. I also use TheaRender. Fortunately for you, I scheduled my consolidation process to work on my older systems first. My reason for doing it this way is that I consider it most beneficial to me to learn strategies for the most older/stubborn systems first to provide me with greater experience to maximize the potential of my later, UEFI systems. In starting with the Tyan (which like your MBD-X8DTH is a LGA 1366 motherboard), I've learned that the OS plays a great role, at least, with non-UEFI bioses. My Tyan wouldn't boot under Windows with more than 8 GPU processors (4xGTX590s); but when I switched the boot OS to Linux Mint 16 (it's free), the system booted as normal with 4xGTX 590s + 1x GTX 480 and with 5xGTX 590s it booted normally and as I added more GTX 590s , it kept booting normally until I had populated all of the system's eight double wide PCIe slots with 8xGTX590s for a total of 16 GPU processors. Moreover, with the addition of each GPU processor the system used all of them, but Octane would use only 12 [ viewtopic.php?f=40&t=43597&start=240#p248320 ]. This leads me to believe that Linux, itself, unlike the other OSes, maximizes the IO memory hole when booting. So for the time being, I will not be modifying the boot straps since I'll be dedicating my GTX 590 [and GTX 480] rendering systems to running Blender under TheaRender Presto Hybrid GPU/CPU Rendering*/ [ https://www.thearender.com/site/index.p ... u-cpu.html - " ... with every GPU and CPU core running Presto, it means fast, very fast, rendering!" {Emphasis added}] and Blender Cycles rendering.

*/ Regarding using TheaRender Presto Hybrid GPU/CPU Rendering with my low memory GPU processors like those in my the GTX 480s, 590s and 690 in my Tyan and SR-2s systems - "Rendering high resolution images with multiple channels is usually an issue since the GPU memory can be a restricting factor. With our bucket rendering implementation of Presto, we managed not only to overcome this potentially limiting factor but also to improve scalability on bigger render clusters." [ http://theapresto.com/bucket-rendering.html ] Redshift3d also supports bucket rendering as well as out-of-core textures and geometry [ https://www.redshift3d.com/support/faqs/#question554 ]. Neither Redshift nor Thea has an arbitrary 12 GPU limit, but running more than 8 GPU processors in Redshift does require running additional instance(s) on the same system per Redshift's recommendation .

Fri Sep 18, 2015 12:48 pm

i played today with one titan x card and i was not able to change something. i wrote to around 10 guys, maybe they can change something. in my case the mainboard don´t boot. with 5 titan cards it works, with 6 not. 7 gtx 780 cards running perfect.

7x GTX 780 178x7 = 1246 MB iomen
5x Titan X = 1530 MB iomen
7x titan X = 2142 MB iomen

i have 2 SR-X with 2x Xeon 2687W here. This board has 128Mb UEFI Bios and 7 PCI-E 3.0 slots. Purchased today the 2x 16 GPU Splitter from Amfeltec. Send today. I will test and post.

My GPU Rigs are
2 Systems with 14x Tesla K40 = 40320 CUDA 3.5 Cores
2 Systems with 14x Titan X = 43008 Cuda 5.2 Cores

but we need more ......

kind regards
thunderbolt

Fri Sep 18, 2015 6:18 pm

thunderbolt1978 wrote:i played today with one titan x card and i was not able to change something. i wrote to around 10 guys, maybe they can change something. in my case the mainboard don´t boot. with 5 titan cards it works, with 6 not. 7 gtx 780 cards running perfect.

7x GTX 780 178x7 = 1246 MB iomen
5x Titan X = 1530 MB iomen
7x titan X = 2142 MB iomen

i have 2 SR-X with 2x Xeon 2687W here. This board has 128Mb UEFI Bios and 7 PCI-E 3.0 slots. Purchased today the 2x 16 GPU Splitter from Amfeltec. Send today. I will test and post.

My GPU Rigs are
2 Systems with 14x Tesla K40 = 40320 CUDA 3.5 Cores
2 Systems with 14x Titan X = 43008 Cuda 5.2 Cores

but we need more ......

kind regards
thunderbolt

Unfortunately my ability to lend specific assistance to you is hampered greatly by my having only EVGA SR-2s (and not any EVGA SR-Xs) and EVGA's practice of providing, at best, only the most cursory of system bios detail in their motherboard manuals. Thus, I don't have the ability to even begin to help you with manually setting the SR-X's bios to maximize IOMem, unless you provide me screen pics of each and every bios screen related to all PCIe related settings, especially any setting entitled "Above 4G"*/ or likewise.

Gordan79 is as good of an EVGA motherboard resource as I have been able to find. Thus, since it is his opinion that Nvidia "bumped the total BAR size from 176MB to 304MB*/ between Kepler and Maxwell" GPUs (I decided to skip Maxwells because of their teeny weeny bump in rendering speed over the GTX 700 series), I have no basis to doubt the accuracy of Gordan79's assessment. What Nvidia may have been relying on when it released Maxwells was the ability of then currently sold UEFI motherboards having an "Above 4G" or similarly named function to able to accommodate almost a doubling of IOMem space per GPU processor. Without any "above 4G" functionality, and not currently having access to any up-to-date guide on manually hacking bars of Maxwells (thus greatly increasing the risk of bricking the GPU in the process), my final suggestion remains to try booting the system using Linux, which appears to be able to circumvent, at least a few, poorly designed IOMem structures and thus to that extent provides an IOMem cure even at the boot stage for, at least, as much IOMem space as is possible, i.e., available for use. If Linux "opens" your system's IOMem hole wider, then you can proceed to try to find a way to run your particular non-Linux apps via a Windows emulator and make an assessment of whether any emulator-related performance drop is worth the effort.

*/ 1024 * 4 = 4096; 4096 / 304 = 13.47368421052632, whereas 4096 / 176 = 23.27272727272727. Of course these results could cause one to assume, incorrectly, that nothing else is using IOMem space, but it does give a useful comparison of relative, somewhat close to maximum PCIe device counts as they may relate to Maxwells vs. Keplers at a give/set level of IOMem space.

P.S.
(1) Since it's Gordan79's assessment that your Maxwells (Titan Zs) use a lot more IOMem than your GTX 780s (Keplers), what happens when you install only the Tesla K40s (Keplers) on the SR-Xs? Can you run 7 of them on each SR-X?

(2) With regard to your Supermicro X8DTH motherboards and running Linux on them, you may want to explore the functionality of the setting - “SR-IOV Supported” :

"Select Enabled to enable Single Root I/O Virtualization (SR-IOV) support which works in conjunction with the Intel Virtualization Technology and allow multiple operating systems running simultaneously within a single computer via natively share PCI-Express devices in order to enhance network connectivity and performance.
The options are Enabled and Disabled."
X8DTH-6/X8DTH-6F/X8DTH-i/X8DTH-iF User's Manual, p.4-13

Otherwise, I didn’t see anything in it to indicate that it has “Above 4G” functionality either.

(3) With more modern Supermicro motherboards like the MBD X9DRXs [ http://www.supermicro.com/products/moth ... drx_-f.cfm - http://www.superbiiz.com/query.php?s=X9DRX - under $500 ] and MBD X10DRXs [ http://www.supermicro.com/products/moth ... x10drx.cfm - http://www.amazon.com/Supermicro-X10DRX ... B00XQ3I14K - about $650 ] that have “Above 4G” as a bios selection you might be able to run 14x Tesla K40 ( = 40320 CUDA 3.5 Cores) and, at least, some of the Titan Xs from one motherboard and any remaining Titan Xs and other GPU processors from a 2nd such motherboard and thus have room for further growth on that 2nd one.

Sat Sep 19, 2015 7:52 am

thunderbolt1978 wrote:Hi Tutor,

i´m running a rig with 7 GTX 780 6GB, work like a charm. I used the same board with 7 titan x. ...

kind regards
thunderbolt

Finally! finally someOne done this!!! I've been dreaming 'bout this route for Year, but neither have need for this, nor any of my friends wanted - but heck, it's the best way to put so much power into one motherboard!!!

believe with 7X TitanXs You should see ~1200+ in terms of OctaneBench! That's a proper power.

Could oYou talk more about difficulties converting TitanX to single slot? =)

Thanks,
tom

Sat Sep 19, 2015 8:41 am

Thunderbolt,

If you missed my question: what fittings did you use for connecting gpus? There is so little space between them. And is there enough room between cards for extra backplates?

PS. I'd expect octane bench score at about 1000-1050 on 7 Xs. Is it what you get?

Sat Sep 19, 2015 12:36 pm

Hi,

one titan x card has around a single gpu score 125, so 7 of them should have 875 or 900. my amfeltc splitter arrives in 2 days. the first thing i will do i to use 12 cards and render the benchmark. i hope the upload works. on the titan cards there is no backplane. i choosed the alphacool titan x waterblock. the connection between was the slots i used this http://www.alphacool.com/product_info.p ... black.html. i dont trust them but i see no water somewhere.

i can only bench with 5 cards is was around 620. did not optimize the bios so far, later i modify the bios to disable turboboost. i run several tests on the cards and maximise voltage and clockrates. i dont touch the memory stuff.

at first you have to remove the 2 slot backplane, in my case the cards have 2 dvi ports. remove the second dvi port and cut the dual slot metal plate into a single one. take around 10 min per card.

kind regards
thunderbolt

Sat Sep 19, 2015 1:15 pm

Thanks for the link for the sli fitting - no need to worry about leaks, it looks good.

When you said 'remove the second dvi port' - did it require cutting any wire on pcb or any other brute force intervention?

Sat Sep 19, 2015 2:06 pm

remove the second dvi means you have to unsolder the 24 dvi pins ...

Sat Sep 19, 2015 2:07 pm

thunderbolt1978 wrote:remove the second dvi means you have to unsolder the 24 dvi pins ...

That's what I thought

Thanks.