Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

rappet wrote:
smicha wrote:Not on this board. You'd need ASrock x99 ws e
http://www.asrock.com/mb/Intel/X99%20WS-E/
But i.e. 5820K or other i7 cpu could be sufficient to have 7x GPU work...
There is something about lanes needed in communication from GPU to CPU, right?
It has dual plx chips so 28 lane cpus shall work too. 4G must be on.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
rappet
Licensed Customer
Posts: 1962
Joined: Fri Apr 06, 2012 3:57 pm
Location: The Netherlands
Contact:

smicha wrote:
rappet wrote:
smicha wrote:Not on this board. You'd need ASrock x99 ws e
http://www.asrock.com/mb/Intel/X99%20WS-E/
But i.e. 5820K or other i7 cpu could be sufficient to have 7x GPU work...
There is something about lanes needed in communication from GPU to CPU, right?
It has dual plx chips so 28 lane cpus shall work too. 4G must be on.
Funny thing is that few years back I made switch from CPU-rendering to GPU-rendering
and I thought CPU did not matter that much anymore when going for GPU-rendering.
About that lanes is over my head to understand at this point...just do not know the technical aspect of it.
I was just wondering about what minimum CPU is needed to get multi-GPU work properly.

Anyway.. thanx for sharing your knowledge.

cheers,

4090+3089ti & Quad 1080ti
ArchiCAD25, ofcourse Octane & OR-ArchiCAD plugin (love it)
http://www.tapperworks.com
http://www.facebook.com/pages/TAPPERWOR ... 9851341126
http://www.youtube.com/user/Tapperworks/videos
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

My recent 2x E5 2630v4 clocked at 2.2GHz (boosts to 3.1 but I saw 2.7 only) are so much much faster than my 2600k at 4.5GHz. Imagine 40 threads running at 37C at full load :mrgreen: Don't look at cpu clock. These xeons are cheaper than a single 6950k. Xeons with ECC with 32GB stick rocks. Cost is the same as x99 i7 platform.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

PS Xeons activity on 4K 60fps and in zbrush
You do not have the required permissions to view the files attached to this post.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Seekerfinder
Licensed Customer
Posts: 1600
Joined: Tue Jan 04, 2011 11:34 am

smicha wrote:
Seekerfinder wrote:
smicha wrote:Guys,

I have some incredible news - I am building next workstation with 7 watercooled GTXs and dual Xeons on AsrockRack mobo - imagine that I just wrote an email to Asrock support and they release a new bios exclusively for me to handle 7 gpus without any problems! And it took one day only. Now when I think about Asus...what a shame.

Stay tuned for build log ;)
That's awesome Smicha! Is it on the old X79 Extreme11? Can you share the bios or will they make it available to the Octane community!?
Seeker
http://www.asrockrack.com/general/produ ... ifications
Thanks Smicha. I can't see if that board has an additional PCIE power option...? Any idea?
Win 8(64) | P9X79-E WS | i7-3930K | 32GB | GTX Titan & GTX 780Ti | SketchUP | Revit | Beta tester for Revit & Sketchup plugins for Octane
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Bottom, next to RAM slots, molex.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Seekerfinder
Licensed Customer
Posts: 1600
Joined: Tue Jan 04, 2011 11:34 am

smicha wrote:Bottom, next to RAM slots, molex.
Found it, thanks.
Looks like a great board and great value.
Win 8(64) | P9X79-E WS | i7-3930K | 32GB | GTX Titan & GTX 780Ti | SketchUP | Revit | Beta tester for Revit & Sketchup plugins for Octane
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

rappet wrote:... .Funny thing is that few years back I made switch from CPU-rendering to GPU-rendering
and I thought CPU did not matter that much anymore when going for GPU-rendering.
About that lanes is over my head to understand at this point...just do not know the technical aspect of it.
I was just wondering about what minimum CPU is needed to get multi-GPU work properly. ... .,
Here're a few easy read, technical references re aspect of lanes - http://www.hardwaresecrets.com/everythi ... express/4/ & https://en.wikipedia.org/wiki/PCI_Express#Lane . Lane speed increases with PCIe version number, i.e., PCI-e v2 standard is twice as fast as PCI-e v1 standard; PCI-e 3 standard is twice as fast as PCIe v2 standard ( or four times faster than PCI-e v1 standard). The principle on which Amfeltec's x4 GPU Oriented Splitters operate (each of four attached GPU cards is x1 lane) is that a GPUs speed of calculation is the same regardless of lane count, but where lane speed/number counts is getting data from CPU/memory to GPU/memory. So with x1 lane data takes longer to get there (usually measured in seconds to a minute or two (CPU power/speed affected) [ like driving down a single lane (per direction) curvy country road, but once in the GPU, the speed of computation is governed by GPU's performance. So with higher defs and more frames, it might become a more noticeable matter. Thus, nature of the rendering project matters, such as a still vs. a multiframe movie/animation project and/or all GPU vs. only CPU vs. GPU/CPU hybrid rendering. Moreover, performing video editing on an system with a low lane count might be a nightmare, unless those lanes are tied to a very fast (i.e, over-clocked CPU*/) of the latest vintage. So the nature of what all one intends to do on a system affects the number of lanes to seek.


*/ Overclocking CPU can increase speed of travel/lane. Overclocking potential is greatest on i7 CPUs because there it can be done to discrete parameters, whereas with Xeons (beginning with post Nehalem/with Sandy Bridge CPUs even where the computer system allows overclocking like on Supermicro's Dax line - https://www.supermicro.com/products/nfo/Hyper-Speed.cfm ) overclocking is limited to 7.55% at max and usually 4-5% in practice because it affects so many other parameters of the CPU that the system will crash. "Supermicro Hyper-Speed- Use this item to select the hardware acceleration level of the machine. CPU, Memory, PCIe, and related-components will be accelerated in lockstep. Please note that an improper hyper-speed setting may impede the stability of your machine. The options are Disabled, Level 1, Level 2, Level 3, and Level 4 (not recommended). Supermicro Hyper-Turbo - Select Enabled to maximize the performance of the Turbo-Mode feature built-in the CPU. Please note that an improper hyper-turbo setting may impede the stability of your machine. The options are Disabled and Enabled." Prior to Sandy Bridge Xeons, Xeons could be overclocked in discreet ways, such as memory and CPU speed only. My two dual CPU EVGA SR-2 systems (Seven PCI-E 2.0 Slots and first placed in service in 2010) have dual 3.4 GHz Nehalem Xeons (X5690s) both now turboing in 14 steps from 3.7 GHz to over +5.0 GHz; PCI-e bus overclocked separately by 4%. I still use those systems; they're are loaded with GTX 590s for hybrid [GPU&CPU]/progressive rendering and one GTX 780 6G for video editing.
smicha wrote:Guys,

I have some incredible news - I am building next workstation with 7 watercooled GTXs and dual Xeons on AsrockRack mobo - imagine that I just wrote an email to Asrock support and they release a new bios exclusively for me to handle 7 gpus without any problems! And it took one day only. Now when I think about Asus...what a shame.

Stay tuned for build log ;)
Good job moving Asrock to provide good customer service.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

@Smicha
Great to hear your experience with ASRock!
I know the build will be awesome. Are you set on getting the Titan X Pascal?
In past ASRock helped me very quickly with an update BIOS on X79 Extreme 11, even when a first test BIOS failed to give me what I wanted. However, I did stump them on getting me a 14th GPU.

@Tutor
Is it possible that a different CPU can 'unlock' or even 'expand' PCI lanes (not slots) such that you could add more GPU than you could with a previous CPU? If so, what is it that helps expand, CPU memory cache?
Nutshell = what CPU give me 14 GPU :lol:
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notius, always remember that you've already achieved a magnificent feat - getting 13 GPUs running on a single CPU Windows system. Who would have thought that could have been done!*/ Obviously, you - a man of great imagination and talents.
Notiusweb wrote:... .
@Tutor
Is it possible that a different CPU can 'unlock' or even 'expand' PCI lanes (not slots) such that you could add more GPU than you could with a previous CPU?
Yes, in part it's possible. There have to be enough lanes for the data streams to travel. But PCIe lane count doesn't appear to be the issue with your system setup allowing the addition of one more GPU.
Notiusweb wrote:If so, what is it that helps expand, CPU memory cache?
Lanes are only transport routes/data pathways. If there's no traffic/data to moved, lanes don't get used. If the number of lanes match or exceed traffic needs , all is good. If the number of lanes is less than the number needed for the traffic, some items don't get moved. In sum, lanes are just paths. CPUs and memory are the crucial components to handle IO space needs. For example, a system running two (or more) Xeon CPUs and having access to adequate memory will tend to have the IO space capacity to support running more GPU processors than does a single i7 CPU system. Here's a simplified version describing how IO space was intended to be allocated when the PCI spec (which preceded the PCI-e spec) was designed - http://electronics.stackexchange.com/qu ... erent-from and a more in depth version here - http://resources.infosecinstitute.com/p ... nsion-rom/ .
Notiusweb wrote: Nutshell = what CPU give me 14 GPU :lol:
No such single CPU exists that cures the ailment about which you're concerned (but if anyone can find such a cure, I'd place my bet only on your Industriousness*/). To maximize working GPU processor count, a lot of different system attributes have to converge. The situation is more like that of a team where every participant has to be at the top of his/her game. However, I don't believe that the problem that you're trying to cure is amenable to a "[single] different CPU" cure. The problem is more likely attributable to your CPU count - their not being Xeons and/or more likely your OS selection. Just as running Linux can enable one to exceed your 13 GPU processor count if every other participant is shouldering it's responsibilities , Windows/Microsoft holds the more likely cure for the ailment.**/ Fat chance in getting Microsoft to help you with achieving your goal; but you can ask them - just don't feel belittled by their background snickering and laughter.

*/ Compare the guidance from Amfeltec with your achievement of running 13 GPU processors on a single CPU motherboard:

"The motherboard limitation is for all general purpose motherboards. Some vendors like ASUS supports maximum 7 GPUs, some can support 8.
All GPUs requesting IO space in the limited low 640K RAM. The motherboard BIOS allocated IO space first for the on motherboard peripheral and then the space that left can be allocated for GPUs.
To be able support 7-8 GPUs on the general purpose motherboard sometimes requested disable extra peripherals to free up more IO space for GPUs.
The server type motherboards like Super Micro (for example X9DRX+-F) can support 12-13 GPUs in dual CPU configuration. It is possible because Super Micro use on motherboard peripheral that doesn’t request IO space."

And now you're stomped because you can't get 14 GPU processors running on that single CPU motherboard! Well here's some math to better put your achievement into its proper perspective:

13 (your achievement) / 8 (what Amfeltec said can be done) = 1.625; So, you've exceeded Amfeltec's prediction by 63%. That's like a student having been given a 100 question final exam and the student voluntarily adds 63 more difficult questions to that exam and answers all 163 questions correctly. What grade would that student deserve? I'd say, "Genius." Please, prove everyone wrong again. As far as a X9(or 10)DRX+-F system is concerned, it wouldn't surprise me if you got 21 to 22 GPU processors working correctly (13 * 1.63 = 21.19).

**/ Apple's latest OS removed the 4-5 GPU processor count limit of the past. It was apparently reduced to 2-3 GPU processors in an intervening OS after I had stop updating my Macs' OS. It may have been that Apple had reduced the limit to 2-3 GPUs for a while because Apple had stop making machines that were user adaptable and their top of the line system was and is the 2x AMD GPU trashcan, a/k/a/ the MacPro 2013.
Last edited by Tutor on Sun Aug 28, 2016 3:46 pm, edited 7 times in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
Post Reply

Return to “Off Topic Forum”