Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Post by Tutor » Thu Jul 16, 2015 12:55 pm
An imaginative mind is a terrible, terrifying and dangerous thing to waste, especially when it comes to sandwiches - mozzarella?
What if it's a warped mind? :lol:

Can you increase the GPU spacing and GPU/environmental cooling? That way, there would be no taste of dust, unless the wind is blowing against your backside. Know what - all of these calculations have made my temperature rise and given me a voracious appetite. I think that I'll make a cool sandwich (or two or three). Also, I like more bread than mozzarella cheese. I like my cheese thin and cut into a perfect square and sandwiched tightly (as if glued ) between two larger, cooling, and screwed together slices of bread (one wet and the other airy).
Speaking of food, I actually in the past had restored my failing XP's ATI Radeon card by baking it in the oven...20 minutes @380 degrees, if I recall correctly. So I definitely can appreciate the culinary skills I see there!

I did get an Amfeltec GPU cluster (single adapter, 4-way), and I am now using this for 2 of the Z's. It runs very nice, I am very happy with its performance. It is sturdy as well, almost a little too robust. Heavy with the GPUs, like a 30 lb weight. But no issues at all. I have these 2 cards blowing air outwards to the same side, and they run very cool, however are stock speed. The other 3 on risers on a ad-hoc grill-like stand, fan up blowing air upwards. 2 of these are default OC'd. I like the cluster set up more, but harder to tinker with that, direct plugin to board is less flexible than the cable risers, with are very fluid if I ever examine/move the cards around.

Tutor and all, I am using MSI afterburner, which allows for group tweak of GPU by type, or individual tweak. But the app only gives the option to direct tweak 8 GPUs. In other words, it doesn't let me individually tweak the parameters for 9 and 10. The list only goes from 1-8. And one is the X, which means 7 Z cores are available for individual direct tweak, and 2 are not. (remember, I have 9 Z cores currently recognized and functioning, not 10). It just so happens, however, that when I use group tweak, it affects all. I do not know if it is because the app sends a mass message to all of the Z's, or it just so happens that in my case 1/2 a Z is among the 1-8 listed on MSI Afterburner's control panel for direct voltage/Fan/etc control, and then the other 1/2 is listed as 9 and 10. It does, however, let me view all 10 GPU cores in the live graph area, so I can monitor all 10 very effectively.

On EVGA Precision, it comes back upon startup with the error, "Too many GPU"...How lame is that... :(. At times I use Titan X primary solo, so then I can try things, but on a limited scale of course. I have only tried MSI Afterburner and EVGA Precision tool thus far.

In general however, one using a software to tweak card settings (such as voltage, temp ceiling, power limit, fan speed, memory, etc) may encounter roadblocks from the software when looking to overclock a multi-GPU arrangement.
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:
Post by Tutor » Thu Jul 16, 2015 12:55 pm
An imaginative mind is a terrible, terrifying and dangerous thing to waste, especially when it comes to sandwiches - mozzarella?
What if it's a warped mind? :lol:

A warped mind shouldn't be wasted either.
Notiusweb wrote:
Can you increase the GPU spacing and GPU/environmental cooling? That way, there would be no taste of dust, unless the wind is blowing against your backside. Know what - all of these calculations have made my temperature rise and given me a voracious appetite. I think that I'll make a cool sandwich (or two or three). Also, I like more bread than mozzarella cheese. I like my cheese thin and cut into a perfect square and sandwiched tightly (as if glued ) between two larger, cooling, and screwed together slices of bread (one wet and the other airy).
Speaking of food, I actually in the past had restored my failing XP's ATI Radeon card by baking it in the oven...20 minutes @380 degrees, if I recall correctly. So I definitely can appreciate the culinary skills I see there!

I did get an Amfeltec GPU cluster (single adapter, 4-way), and I am now using this for 2 of the Z's. It runs very nice, I am very happy with its performance. It is sturdy as well, almost a little too robust. Heavy with the GPUs, like a 30 lb weight. But no issues at all. I have these 2 cards blowing air outwards to the same side, and they run very cool, however are stock speed. The other 3 on risers on a ad-hoc grill-like stand, fan up blowing air upwards. 2 of these are default OC'd. I like the cluster set up more, but harder to tinker with that, direct plugin to board is less flexible than the cable risers, with are very fluid if I ever examine/move the cards around.

Tutor and all, I am using MSI afterburner, which allows for group tweak of GPU by type, or individual tweak. But the app only gives the option to direct tweak 8 GPUs. In other words, it doesn't let me individually tweak the parameters for 9 and 10. The list only goes from 1-8. And one is the X, which means 7 Z cores are available for individual direct tweak, and 2 are not. (remember, I have 9 Z cores currently recognized and functioning, not 10). It just so happens, however, that when I use group tweak, it affects all. I do not know if it is because the app sends a mass message to all of the Z's, or it just so happens that in my case 1/2 a Z is among the 1-8 listed on MSI Afterburner's control panel for direct voltage/Fan/etc control, and then the other 1/2 is listed as 9 and 10. It does, however, let me view all 10 GPU cores in the live graph area, so I can monitor all 10 very effectively.

On EVGA Precision, it comes back upon startup with the error, "Too many GPU"...How lame is that... :(. At times I use Titan X primary solo, so then I can try things, but on a limited scale of course. I have only tried MSI Afterburner and EVGA Precision tool thus far.

In general however, one using a software to tweak card settings (such as voltage, temp ceiling, power limit, fan speed, memory, etc) may encounter roadblocks from the software when looking to overclock a multi-GPU arrangement.
I too use the same two tweaking tools: MSI Afterburner and EVGA Precision. Thanks for pointing out how tools, such as those two, behave (and their short comings) in a massive GPU system. Thanks especially for this information: "It just so happens, however, that when I use group tweak, it affects all." This would suggest that a GPU tweaker might be better situated when building a massive GPU system by having identical GPUs or having as few different types as possible. Have you determined whether slot placement can be used to "find" a GPU that might otherwise not be listed?
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Postby Tutor » Fri Jul 17, 2015 8:26 pm
Have you determined whether slot placement can be used to "find" a GPU that might otherwise not be listed?
You are right, this should be the variable tested. In fact, there does seem to be a relationship between the placement of cards in the PCIE lanes and the listing of them in MSI Afterburner. However, it's not an intuitively predictable direct relationship, as in lane1 = GPU 1. I know because my lane 1 is Titan X, and Titan X appears as GPU 3 in MSI A. However, I did switch PCI risers around in lanes 4, 5, and 6. And the listing of the OC'd cards did get re-shuffled. But again, the results were not intuitively predictable. One stayed the same no matter where I moved, and the other 2 switched. Again, these are only my experiences, so it may not necessarily be a universal event one would face. Anyway, I think I will have better results going to the MSI Afterburner Forum and throwing around a request for 10-12 GPU control.

Also, on another MSI Afterburner note, played around some more with temp control and overclocking, and cracked the 1,000 mark on Octane Bench's max score.
I really like the 1x GT 730M for a score of 2.43. That test must have been less about expectation and more about curiosity and intrigue with the potential of the system...I imagine probably that person came away with a desire to push the boundaries of their rig further. That non-quantifiable component of art is impressive. :)
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

Notiusweb wrote:
Also, on another MSI Afterburner note, played around some more with temp control and overclocking, and cracked the 1,000 mark on Octane Bench's max score.

Congrats! =) that's an achievement!
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Post by glimpse » Sun Jul 19, 2015 3:58 am
Congrats! =) that's an achievement!
Thank you, Tom. Got good mentoring from you on the overclock!
http://tomglimps.com/value-based-multi- ... ne-render/
8-)
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote: ... . Also, on another MSI Afterburner note, played around some more with temp control and overclocking, and cracked the 1,000 mark on Octane Bench's max score.


Congratulations. I knew your system could do it and I believe that it can go further. With 10 overclocked GPU processors, such as the ones that you've got, at or over 1,100 points is within reach, particularly if I can reach over 900 points with eight overclocked GPU processors. So keep at it. Can GPU PCIe slot selection not only affect GPU recognition/grouping in MSI AfterBurner, but also GPU recognition in the OS, as well as affect GPU performance?
Notiusweb wrote:I really like the 1x GT 730M for a score of 2.43. That test must have been less about expectation and more about curiosity and intrigue with the potential of the system...I imagine probably that person came away with a desire to push the boundaries of their rig further. That non-quantifiable component of art is impressive. :)
Hopefully, they've got no way to go, but up.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

GPU Tweaking in Linux

Since it appears that Linux holds the key to fully maximizing working GPU count per system, I've found a few resources for those interested in using Linux and desirous of tweaking their GPUs in Linux:

1) http://discourse.ubuntu.com/t/lets-burn ... linux/1610 ;
2) "How To Overclock New NVIDIA GPUs On Linux" - http://www.phoronix.com/scan.php?px=MTY ... =news_item ;
3) https://wiki.archlinux.org/index.php/NVIDIA#Tweaking ;
4) https://www.gpugrid.net/forum_thread.php?id=3713 ; and
5) http://mintguide.org/system/445-overclo ... -mint.html .

And if per chance you want to speed up your CPU : http://forum.odroid.com/viewtopic.php?f ... 0000#p3338 .

You're own your own and only you are responsible for the outcome(s).
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

by Tutor » Sun Jul 19, 2015 1:35 pm
Can GPU PCIe slot selection not only affect GPU recognition/grouping in MSI AfterBurner, but also GPU recognition in the OS, as well as affect GPU performance?

Slot selection does impact the sequential listing of a GPU in both Device Manager and Nvidia Control Panel. As to what effect this in itself has, I do not know.
As far as GPU performance, talking about speed, I guess I could try to race an OC Z core vs another on lanes 5 v 6. I could also race a stock core on lane 3 vs stock on lane 4, however one would be running through a riser and the other on the Amfeltec car's lane. So, I don't know how clean the results would be as far as giving lane v lane speed. But, I could try at some point.

Tutor, is your rig completed or still in progress? If still in progress are you going to be using during construction, or would you be waiting for a certain level of set-up before the first tests/usage?
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

There's an 8xTitanX benched on OctaneBench at 1,007 Pts. Who are you?
Last edited by Tutor on Fri Jul 24, 2015 1:52 am, edited 1 time in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:by Tutor » Sun Jul 19, 2015 1:35 pm
Can GPU PCIe slot selection not only affect GPU recognition/grouping in MSI AfterBurner, but also GPU recognition in the OS, as well as affect GPU performance?
How are each of your video cards cooled - air or traditional water-cooling or hybrid (air and water)?
Notiusweb wrote:Slot selection does impact the sequential listing of a GPU in both Device Manager and Nvidia Control Panel. As to what effect this in itself has, I do not know.


First, what I am hoping is that you can get all 10 of your Titan Z GPU processors working, alone with your Titan X, for 11 total GPUs by rearranging them. Secondly, given that slot selection does impact the sequential listing of your GPUs in both the Device Manager and in the Nvidia Control Panel, it is likely that it may also affect the listing in MSI AfterBurner. Perhaps you can arrange the Titan X and most of the Titan Z GPUs that have the greatest overclocking headroom so that they appear in Afterburner; set the Titan X singularly and for its best/safe over clock and then set the Titan Zs all globally as you appear to have been doing; then save that as a profile to be loaded on reboot. Then reboot, and see if you can then improve performance by concentrating on tweaking the faster of GPUs individually and save the changes in another profile to be reloaded on reboot.
Notiusweb wrote:As far as GPU performance, talking about speed, I guess I could try to race an OC Z core vs another on lanes 5 v 6. I could also race a stock core on lane 3 vs stock on lane 4, however one would be running through a riser and the other on the Amfeltec car's lane. So, I don't know how clean the results would be as far as giving lane v lane speed. But, I could try at some point.
Lane v. lane speed shouldn't matter much except for the largest of scene data transfers. Another approach might be, if heat is a significant performance inhibiter, to arrange the GPUs, to the extent possible (given that almost all of them are dual GPU cards), such as warmest/highest performing cards closest to the floor and/or place a plastic shield between the upper and lower cards in the Amfeltec chassis to divert any heat from the lower placed cards away from the upper placed cards. Also, are you using a separate fan to further cool the GPUs on the Amfeltec chassis?
Notiusweb wrote:Tutor, is your rig completed or still in progress? If still in progress are you going to be using during construction, or would you be waiting for a certain level of set-up before the first tests/usage?
They're still in progress. I'm upgrading one Tyan and three of my four Supermicro systems at this time (most of the GPU space of my fourth Supermicro is being saved for Pascals and any overflow from the other works in progress). As to the four systems now in progress, my consolidation goal is to have one 21 GPU (all Titan 6Gs [6xTitanZs+1xTitanBlack+8xTitans) and three 12(+) GPU systems (one all GTX 780 TI 3Gs and any overflow from the other works in progress); another all GTX 780 6Gs (and any overflow from the other works in progress) and the other (mainly GTX 580 3Gs, 680 4Gs and any overflow from the other works in progress). They're all in different degrees of completion because I have to use them, along with my other systems, for current projects, but I intend to complete the 21 GPU system before completing the other three. Still to come - (1) I have to finish designing and fabricating parts for custom cooling ideas that I have (the fabs involve my revisiting my torch welding aluminum chops - gold, silver and cooper are to expensive); (2) All of the GPUs will be either traditionally water-cooled or I'll convert them into hybrids (air and water cooled); (3) I have to rank (per system) all of the GPUs thermally and for performance, as best I can, so that I can arrange them optimally for best performance, given the extremely useful information that you've given to me about being able to see some and not see other GPUs in AfterBurner; and (4) When all modifications have been implemented and tweaking has been completed, then I'll publish test scores. Following this process I have to work around my need to have these systems available as rendering needs dictate in the interim. In sum, this time around I've decided to take the time to keep my network as fully functional as is possible under the circumstances and to let all that I've learned from 30 years of system building be reflected in each of these (and my subsequent builds) so that I have the least number of "I wish I had ... ."s.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
Post Reply

Return to “Off Topic Forum”