Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

smicha wrote:Guys,

Any advice when there are only 4 gpus visible in windows and there are more on board? Is there anything I can do apart from clean reinstall of nvidia drivers?

They (7) were visible all when connected via DP cable to the highest card. When connected to lowest via DVI only 4 of them are visible. When HDMI cable was used same thing. Now DP cable is connected to the 1st cards and only 4 are visible... Any clues?
Smicha,

This assumes that GPUs are in working order and properly seated and cables (could be PCI-e or splitter or riser) are in working order and properly seated. Since I often move my GPUs between systems, I've yet to find that the following [or my modification of it - described in the footnote] doesn't dispel those ghosts, even in situations that are just as you describe [ so consider the following introduction by Otoy to be under-inclusive of the situations where the numbered steps [ or my mod of it - see footnote ] work */]:

"Windows and the Nvidia driver see all available GPU’s, but OctaneRender™ does not.

There are occasions when using more than two video cards that Windows and the Nvidia driver properly register all cards, but OctaneRender™ does not see them. This can be addressed by updating the registry. This involves adjusting critical OS files, it is not supported by the OctaneRender™ Team.

1) Start the registry editor (Start button, type “regedit” and launch it.)
2) Navigate to the following key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E968-E325-11CE-BFC1-08002BE10318}
3) You will see keys for each video card starting with “0000″ and then “0001″, etc.
4) Under each of the keys identified in 3 for each video card, add two dword values:
DisplayLessPolicy
LimitVideoPresentSources
and set each value to 1
5) Once these have been added to each of the video cards, shut down Regedit and then reboot.
6) OctaneRender™ should now see all video cards." [ https://docs.otoy.com/#sector_9 ]

*/ In particular situations where just following the above steps once doesn't work, I've found success by loading the number of GPUs that load without issue; then perform the above listed steps; then shutdown; then add one more GPU; then restart; then repeat the whole process as each additional GPU is added. Once all GPUs have been added successfully, I perform the process once more to better accommodate the next addition.
Last edited by Tutor on Fri May 06, 2016 1:37 pm, edited 3 times in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:
smicha wrote:Guys,

Any advice when there are only 4 gpus visible in windows and there are more on board? Is there anything I can do apart from clean reinstall of nvidia drivers?

They (7) were visible all when connected via DP cable to the highest card. When connected to lowest via DVI only 4 of them are visible. When HDMI cable was used same thing. Now DP cable is connected to the 1st cards and only 4 are visible... Any clues?
Hey Smicha, Device Manager recognizes only 4 or all 7? Or does it recognize all 7, but 3 have a yellow exclamation point?

When I see one or more exclamation points, I just reinstall the Nvidia driver until no yellow exclamation marks appear. It has been my experience that the only time this doesn't work to get rid of the yellow marks is when either IO space has been fully exhausted or the video card(s) need(s) repair or cabling (could be PCI-e or splitter or riser) is bad or not properly seated.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

recognizes 3 only for now - I am running clean installs now
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

smicha wrote:recognizes 3 only for now - I am running clean installs now
If you have 7 plugged in and powered on and Device Manager only sees 3 (meaning not 7 with 4 not working, literally only 3), I would say BIOS could also be involved here somehow. You could try the ol' install 1 at a time thing, where you do a clean install with only 1 powered on (take out PSU plugs from other cards), and then add another, then another, etc. You may encounter place where it stops recognizing, and then you could tackle that 'item' as it were (maybe an ASUS 4G decoding thing, or a lane arrangement thing...). Sometimes it's hard when you install 7 at once and say "Go!" to the PC, because the PC is like "WTF?...." It might think it's in an SLI only 4 lanes are to be used mode or something.

I don'y want you to F up the board and get an "00" if its an ASUS however :?
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Thanks. Now I am just doing clean install one by one powering only one by one - not all of them at once. I'll report you how it goes.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

So here is the issue:

1. All 7 were working fine. I was using DP cable connected to the 1st gpu.
2. The guy who received the machine connected a monitor via dvi cable to the last 7th gpu - it was the only gpu with uncut dvi port and then only 4 of them were visible.
3. He got hdmi and dp cable connected to the 1st gpu - only 4 were visible.
4. When there are 1,3,5,7, and 2nd pci-e slots populated the 2nd one seems to be not detectable. Even when powering one by one and clean install nvidia every time.
5. What I assume is that when he connected the monitor via dvi cable from the lowest gpu (although I told him to use DP cable...) driver messed up something.
6. Current situation - we are unplugging the machine from the wall, unplugging all gpu cables from gpus, unplugging the 24 pin cable from the mobo and waiting for psu to discharge.... by Monday.
7. We'll see... if not - the system must be drained out and gpus removed, bios set to defaults, and 4G decoding set again before more than 4 gpus are populated.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

smicha wrote:So here is the issue:

1. All 7 were working fine. I was using DP cable connected to the 1st gpu.
2. The guy who received the machine connected a monitor via dvi cable to the last 7th gpu - it was the only gpu with uncut dvi port and then only 4 of them were visible.
3. He got hdmi and dp cable connected to the 1st gpu - only 4 were visible.
4. When there are 1,3,5,7, and 2nd pci-e slots populated the 2nd one seems to be not detectable. Even when powering one by one and clean install nvidia every time.
5. What I assume is that when he connected the monitor via dvi cable from the lowest gpu (although I told him to use DP cable...) driver messed up something.
6. Current situation - we are unplugging the machine from the wall, unplugging all gpu cables from gpus, unplugging the 24 pin cable from the mobo and waiting for psu to discharge.... by Monday.
7. We'll see... if not - the system must be drained out and gpus removed, bios set to defaults, and 4G decoding set again before more than 4 gpus are populated.
Can you system restore it to before you gave it to him, it may restore the registry at least? If not, the next time you get it working create manual restore point on the C Drive so that you can reload the registry settings as a backup at least for the OS. But it sounds like connecting the cable to the 7th GPU mucked up the BIOS into some other mode...
7 GPU watercooled, what a pain in the neck to undo. Been there, I feel for you man :|
But keep at it, you'll get it working!
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

That's what I just thought about registry. Now I think that if discharging PSU will not work maybe a clean windows install will fix it. We'll know it all by Monday. Thank you Notiusweb!
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

smicha wrote:That's what I just thought about registry. Now I think that if discharging PSU will not work maybe a clean windows install will fix it. We'll know it all by Monday. Thank you Notiusweb!
In view of your last three posts, this is my recommendation:

1) In windows, go to "Control Panels/All Control Panels/ Programs and Features/" and uninstall each program that is listed as "Nvidia {or CUDA} ...". This will likely require interim reboots to get rid of all of them. After all Nvidia/CUDA drivers, etc. are completely uninstalled, shutdown completely.
2) Plug in only the video card that is to be used as the display card and connect it to the monitor/display device at whatever video out port for that video card is to be used in the future. Here it seems to be the dvi port of the only card with the dvi still intact.
3) With only that one video card physically installed [ I'm not certain of the membership in the water-cooling setup */ ], reinstall Nvidia video & CUDA drivers, etc and then shutdown completely. Restart and test for Octane usage just that one DVI connected video card installed. If all is well --
4) Reinstall the least number of additional cards possible - Again, I'm not certain of the membership in the water-cooling setup.
5) Restart and allow/start Nvidia video and CUDA drivers to be installed for additional cards. If all if fine with them, then shutdown and add additional cards incrementally if possible, allowing/starting Nvidia video and CUDA driver installs for additional cards and after each (if applicable) additional card(s) is/are added, until all are in place and working.
6) If all is not well at any point, then use "Regedit" to delete the registry entries for each video card added just before the problem begun - so ideally you need to keep track of which registry entry relates to which card(s) along the way; then restart at point no. 1, above, adding after each video card/video card-grouping successful addition the registry hack that I recommended above, namely here: viewtopic.php?f=40&t=43597&start=520#p272771 .


*/ This might require leaving one or more other video cards attached to that video card, to remain unplugged from PCI-e ports and from the PCI-e power connectors. This might require placing non-conduction protection underneath and around them. But this is, of course, just a shot in the dark because I can't see the things themselves so I'm now running blind.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Tutor,

Thank you for your time. Points 1-5 done previously - not working. I think #6 is the key. BTW do you think that clean install of win10 means same as having proper (previously working) registry settings?
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
Post Reply

Return to “Off Topic Forum”