Best Practices For Building A Multiple GPU System

Discuss anything you like on this forum.
Post Reply
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:You can network 110 GPU for 180K CUDA on Furryball?...LOL
Imagine that in one rig, all watercooled.
do they (F-Ball) have some sort of .OCS / .ORBX -like export so you can get a project from main PC to all render PCs? Or does each PC have its own independent creation and render workflow?
I don't have that many Furryball licenses - only two for RT. Furryball doesn't come with a true network renderer - only remote rendering, i.e., you must have installed FurryBall with valid license on server side (and on this computer you must start FurryBall server from FurryBall.exe application). On client side (where you have Maya or C4d or 3ds Max) you can have installed FurryBall as client only and you can connect via FurryBall menu.

The upside to FurryBall in terms of speed is that most Furryball updates/upgrades have resulted in faster rendering, so far as during my two years of owning it. For example, this is how much faster RT V1.3 is over RT V1.2 -"Huge speedup - 20%-35% in most cases - even 300% in some scenes."[ http://furryball.aaa-studio.eu/news/fur ... edits.html ].
Notiusweb wrote:Hopefully with GPU speed increases we'll hit a real time snap render type thing, like 4K for 16000 samples in under a second per frame for big rigs like yours, and mini rigs like mine.
The drag on this occurring sooner rather than later for many will be the tendency for our application supplier(s) to add additional features that consume more and more resources of the most recent/powerful GPUs, like Octane V3 vs. V2 - the price we often have to pay for getting more feature-rich tools.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

My review of Nvidia presentation:

"Blah-blah-blah....A.I."
"Blah-blah-blah...self driving cars",
"Oh...ummm, no consumer-level Pascal GPU yet..."
"Bye!..."

:x
Last edited by Notiusweb on Tue Apr 05, 2016 6:58 pm, edited 1 time in total.
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

& all those rumors about 32GB PAscal based gaming cards =))) with 2x 3xperformance.. - man few weeks before this event was a joke =DDD
Last edited by glimpse on Wed Apr 06, 2016 6:20 am, edited 1 time in total.
User avatar
itou31
Licensed Customer
Posts: 377
Joined: Tue Jan 22, 2013 8:43 am

So I can buy a 980Ti ... with the release of pascal, many gamers will sale their 980Ti ... :roll:
I7-3930K 64Go RAM Win8.1pro , main 3 titans + 780Ti
Xeon 2696V3 64Go RAM Win8.1/win10/win7, 2x 1080Ti + 3x 980Ti + 2x Titan Black
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

Or we can get one of those DGX-1's or whatever they are called, for ONLY $129,000 (applause apprehensively starts, then dithers confusingly).
It's deep-learning A.I. so it will do all the designs rendering for us..
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
linvanchene
Licensed Customer
Posts: 783
Joined: Mon Mar 25, 2013 10:58 pm
Location: Switzerland

Notiusweb wrote:Or we can get one of those DGX-1's or whatever they are called, for ONLY $129,000 (applause apprehensively starts, then dithers confusingly).
It's deep-learning A.I. so it will do all the designs rendering for us..
I got the impression that they wanted to focus on developers only at GTC.
I wonder if the 16GB VRAM of the Tesla P100 would be a limiting factor for GPU rendering with the DGX-1.

To me it seems the focus of Tesla is more on memory bandwidth than on capacity.


Nevertheless it seems we now at least have some more hints about the Pascal architecture.
Especially NVLink between GPU and CPU seems interesting:

compare:
viewtopic.php?f=40&t=53430
Win 10 Pro 64bit | Rendering: 2 x ASUS GeForce RTX 2080 Ti TURBO | Asus RTX NVLink Bridge 4-Slot | Intel Core i7 5820K | ASUS X99-E WS| 64 GB RAM
FAQ: OctaneRender for DAZ Studio - FAQ link collection
User avatar
Notiusweb
Licensed Customer
Posts: 1285
Joined: Mon Nov 10, 2014 4:51 am

by linvanchene » Wed Apr 06, 2016 5:35pm

I got the impression that they wanted to focus on developers only at GTC.
Yeah, wonder which business segment fuels Nvidia more, the consumer-level products, or the big enterprise/academia compute products. I would have been interested to see them do something new with the A.I. UI, as opposed to hearing specs and an update on self driving cars.

When he said about the race car, "that's a great looking GPU,". I was waiting for him to say next, "and here's an even better one for you...introducing the new PTX-150, 32GB VRam, Cuda 6,144, core clock 2,064..."

The big question in my mind was, is Pascal 'ready' or is it 'delayed', because it had been touted as this amazing successor to the previous generation consumer level GPU and now it only got mentioned as part of the enterprise products. I wonder why they didn't even make mention of an upcoming consumer level GPU. I say this as someone who will buy the next GPU...I'm all for AI and the coming singularity, but hurry up...too slow!
Win 10 Pro 64, Xeon E5-2687W v2 (8x 3.40GHz), G.Skill 64 GB DDR3-2400, ASRock X79 Extreme 11
Mobo: 1 Titan RTX, 1 Titan Xp
External: 6 Titan X Pascal, 2 GTX Titan X
Plugs: Enterprise
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Notiusweb wrote:
by linvanchene » Wed Apr 06, 2016 5:35pm

I got the impression that they wanted to focus on developers only at GTC.
Yeah, wonder which business segment fuels Nvidia more, the consumer-level products, or the big enterprise/academia compute products. I would have been interested to see them do something new with the A.I. UI, as opposed to hearing specs and an update on self driving cars.

When he said about the race car, "that's a great looking GPU,". I was waiting for him to say next, "and here's an even better one for you...introducing the new PTX-150, 32GB VRam, Cuda 6,144, core clock 2,064..."

The big question in my mind was, is Pascal 'ready' or is it 'delayed', because it had been touted as this amazing successor to the previous generation consumer level GPU and now it only got mentioned as part of the enterprise products. I wonder why they didn't even make mention of an upcoming consumer level GPU. I say this as someone who will buy the next GPU...I'm all for AI and the coming singularity, but hurry up...too slow!
Those business segments are likely to pay more per unit than we'd every imagine. Consider how many cars and other units that could use this technology. Junish is now for them. http://wccftech.com/nvidia-tesla-p100-gp100-june-2016/ . According to Wccftech, Computex 2016 [late May - http://www.businesswire.com/news/home/2 ... e-COMPUTEX ] may hold the announcement for GTX users. So, it's likely to poke it's head up this summer. But Titan's aren't first release cards. In the meantime, here're some things to consider:

1) http://cdn.wccftech.com/wp-content/uplo ... -GPU_1.jpg

2) GPU COMPARISON Tesla Maxwell GM200* VS. Tesla Pascal GP100*
SMs = 24 VS. 56
TPCs = 24 VS. 28
FP32 CUDA Cores / SM = 128 VS. 64
FP32 CUDA Cores / GPU = 3072 VS. 3584
FP64 CUDA Cores / SM = 4 VS. 32
FP64 CUDA Cores / GPU = 96 VS. 1792
Base Clock = 948 MHz VS. 1328 MHz
GPU Boost Clock = 1114 MHz VS. 1480 MHz
FP64 GFLOPs = 213 VS. 5304
Texture Units = 192 VS. 224
Memory Interface = 384-bit GDDR5 VS. 4096-bit HBM2
Memory Size = Up to 24 GB VS. 16 GB
L2 Cache Size = 3072 KB VS. 4096 KB
Register File Size / SM = 256 KB VS. 256 KB
Register File Size / GPU = 6144 KB VS. 14336 KB
TDP = 250 Watts VS. 300 Watts
Transistors = 8 billion VS. 15.3 billion
GPU Die Size = 601 mm² VS. 610 mm²
Manufacturing Process = 28-nm VS. 16-nm

GPU COMPARISON Maxwell GM200 VS. Pascal GP100
Compute Capability = 5.3 VS. 6.0
Threads/Warp = 32 VS. 32
Max Warps/Multiprocessor = 64 VS. 64
Max Threads/Multiprocessor = 2048 VS. 2048
Max Thread Blocks/Multiprocessor = 32 VS. 32
Max 32-bit Registers/SM = 65536 VS. 65536
Max Registers/Block = 32768 VS. 65536
Max Registers/Thread = 255 VS. 255
Max Thread Block Size = 1024 VS. 1024
CUDA Cores/SM = 28 VS. 64
Shared Memory Size/SM Configurations (bytes) = 96K VS. 64K

Info Source for Tesla P100 Comparison is [ http://wccftech.com/nvidia-gp100-pascal ... uda-cores/ ] . I added "*" to highlight factor changes that interested me. Where the "*" appears in a header, than means that every change below that header was of interest to me. Also, I bolded the ones that were most interesting to me.

3) We know that, historically, the GTX card component specs that best approximate those of Tesla's is the Titan. Additionally, we know that, historically, the GTXs that compare most closely to the Teslas run lots faster. So GTX Titan Pascal owners may likely experience the performance of a card that has, among other things, about 117% more CUDA cores {with fewer cores per SM, but 233% more SMs} clocked much higher {1.4x} (like about 1500 MHz base for Titans and, even a bit higher, for TIs), with 133% more memory [with a 133% larger cache] that is lots faster, all on a GTX card with, at a minimum, two eight pin PCIe power connectors*/. I still believe that all of the changes will result in a GTX Titan Pascal that is twice as fast at rendering as a Titan X, especial if our 3d rendering software authors can find a way to take full advantage of all that mixed precision computing offers, especially FP16**/ (twice as fast as FP32), to the maximum extent possible.

*/ Just like my GTX 590s, 690s and Titan Zs.

**/ "Deep learning workloads represent a perfect scenario where mixed precision can be leveraged to pretty much double the performance. These workloads inherently require less precision and using FP16 instructions would result in very significant reductions in memory usage that will allow deep learning to occur in considerably larger networks. Essentially allowing machines to learn much more effectively.
Because each Pascal CUDA core can run two FP16 operations at once and each 32-bit register can store two FP16 values at once, the GP100 GPU can effectively do FP16 compute work at twice the speed of FP32, and this is where that doubling in performance comes from." [ http://wccftech.com/nvidia-gp100-pascal ... uda-cores/ ]

Read more: http://wccftech.com/nvidia-gp100-pascal ... z454TvXAuK .

P.S. Interesting Iray Development [ http://www.nvidia.com/object/nvidia-iray.html ] - 90 day free trial of Iray: http://www.nvidia.com/object/iray-plugin-trial.html .
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

a lot of talks & little of conclusion, to sum thigns up.. by far every piece out there is a guess..written to drive Your attention & monetize You clicks. How much more rumor links You're going to post in the thread that labels "Best Practices.." of what??? rumoring?

important stuff, for Octane Render Users: New Pascal Card ~10TFLOPs in SP, while gm200 based (TitanX or 980Ti) would give what around 7 (if not mistaking..). After some optimisation, that could take weeks You're looking to maximum performance increase of roughly 40% (taking out other factors).

Another thing, simple one.. companies like to release cards for summer, because that's the time when school is finished & kids are allowed to play more..or around new year/Xmas season, 'cos that's when parents are more likelly to buy some gifts.. but that doesn't mean Nvidia has to stick with that =)

So all these upcoming rumors.. like all those before (hey, nvidia is going to release new GPU..- just because it was done that year before..it's name is goign to be &*^*80 - just because that was a year ago..etc) is based on thin air..
User avatar
Tutor
Licensed Customer
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

The origin of a rumor is someone's guess.
glimpse wrote:a lot of talks & little of conclusion, to sum thigns up.. by far every piece out there is a guess..written to drive Your attention & monetize You clicks.
I agree with your statement in part. However, I'm not trying to get your attention or to monetize your clicks, but I do admit that I guess, that you guess and that others guess and, in particular, that there're guesses in many posts that I've read, written and referenced in this thread.
glimpse wrote:How much more rumor links You're going to post in the thread that labels "Best Practices.." of what??? rumoring?
Rumors can be true or false, educated or uneducated, etc. However, an outcome is a wholly different matter. An outcome will tell the true measure of the accuracy of a guess. Until there's an outcome, many people may guess. Anyone who asks for more than a guess, before there's an outcome, should know that they're truly seeking a guess. So, I guess I'll post links to other guessers as often as I believe it promotes the making of educated guesses to aid others who are guessing. Admittly, the sources that I referenced contained fact and quess, but we all guess, even you and I guess and post guesses, e.g., see below.

I see you guessing here. I do understand the reasoning in your quess, but I don't fully agree with it.
glimpse wrote:important stuff, for Octane Render Users: New Pascal Card ~10TFLOPs in SP, while gm200 based (TitanX or 980Ti) would give what around 7 (if not mistaking..). After some optimisation, that could take weeks You're looking to maximum performance increase of roughly 40% (taking out other factors).


I see you guessing here, although I agree that past behavior doesn't foreclose different behavior in the future. Past behavior is useful, however, in the guessing game.
glimpse wrote:Another thing, simple one.. companies like to release cards for summer, because that's the time when school is finished & kids are allowed to play more..or around new year/Xmas season, 'cos that's when parents are more likelly to buy some gifts.. but that doesn't mean Nvidia has to stick with that =)
I see you guessing here. Moreover, I do disagree with you that forming expectations based on past patterns/behavior is akin to being "based on thin air." Moreover, even thin air is a reality, i.e., it's still air.
glimpse wrote:So all these upcoming rumors.. like all those before (hey, nvidia is going to release new GPU..- just because it was done that year before..it's name is goign to be &*^*80 - just because that was a year ago..etc) is based on thin air..


Whether the rendering speed increase from Maxwell Titan to Pascal Titan is 40% or much more, or much, much more is rumor/quessing at this time. Many factors, such as all of Pascal's new features and their interplay, how well the 3d rendering software developer takes full advantage of them, and even random factors such as luck or serendipity, will determine an outcome, i.e., Pascal Titan's percentage speed increase over that of Maxwell Titan. To be sure, how all of these variables will interplay and be realized is rumor/guessing until there's an outcome. However, I also recognize that our receipt of a certain outcome doesn't necessarily represent that we got all that was there. Many people, including you and I, guess much of the time. Moreover, many people seek others' best guesses to make purchasing decisions. Thus, I repeat: I guess I'll post links to other guessers as often as I believe it promotes the making of educated guesses to aid others who are guessing.
Last edited by Tutor on Wed Apr 06, 2016 9:38 pm, edited 1 time in total.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
Post Reply

Return to “Off Topic Forum”