Asrock 13 GPU mining MB

Tue Oct 31, 2017 3:04 pm

Notiusweb wrote:I have found that a Pascal Titan X V1 (not the XP) on a 1x riser is = to reg 'old school' Titan X on Mobo x16 as far as render speed.

Because I've found that my GPUs on x1 risers and Amfeltec GPU Oriented Splitters render animations, using Octane, as if the GPUs were a generation or two earlier, I'm in the process of building up my studio with a goal of having 6 x Supermicro X9DRX systems, each with 10 to 11 GPUs on x16 risers (but running effectively at x8 speeds, i.e., the motherboard's x8 sized slot and x8 data transport speed). The x16 risers sit atop of x8 to x16 low profile riser cards. See, e.g., https://www.ebay.com/sch/i.html?_from=R ... s&_sacat=0 .

Notiusweb wrote:For me, I run 6 on risers at 1x, and just use a Titan X as the mobo visual card, but know if I had all planted on the mobo theoretically, I would get mega high temps.

All of my systems (even those with water cooling) run, to some extent or completely, in open air */; but the current generation (Pascals) don't really need to be so exposed.

Notiusweb wrote:The iClone one rendered internally PBR off 1 Titan X @1 frame/sec, for 900 seconds = 15 minutes (3840x2160)
-The Octane Render rendered using 5x Pascal Titan X V1 & 1 Old School Titan X, all at 1x riser, in 4.5 hrs (1920x1080)"

"iClone7
https://vimeo.com/240257069"

"Octane Render
https://vimeo.com/240257228

"

Notiusweb, great jobs! Thanks for providing this info and the links. So cool - I liiiiiiike it.

*/ I move my GPUs around to better accommodate different job demands.

Tue Oct 31, 2017 4:16 pm

Afraid I'm a little confused with Pcie x1, x8, x16 etc. Is x8 a bottleneck compared to x16?

From what I understand, these are the speeds at which information can go from the CPU to GPUs.

So whilst this won't increase the pretty 'colouring in' part of rendering process, it will affect the length of time taken for the black screen of misery that is 'scene evaluation'

95% of the time, when I hit the 'open Octane viewport' it's just for a quick test render. Once the colouring in has started it only takes a few seconds to see what I need to (once the 'scene evaluation' misery has finally finished.)

I think I'm trying to say that the bottleneck for me is the 'scene evaluation ' process.

What's the best hardware config for this, and is Pcie x16 going to makes a difference?

For the other 5% of renders, the high res, many samples ones, I know it's going to take 20 minutes for the colouring in but that's fine as my dog needs a walk.

quick test render (eg is the sun in correct position) 70 seconds scene evaluation, 5 seconds colouring in required (annoying ratio)
final production render = 70 seconds scene evaluation, 20 minutes colouring in required

Wed Nov 01, 2017 1:33 pm

The render speeds themselves are slower, Tutor actually painted the idea nicely:
@1x a card runs like a prior gen version of the same card.

Now, in my case, I'd rather have many 1x cards stacked vs a few 16x cards on the mobo.
My board gives 13 cores, however I only have 7 slots, so I can only have 4 @16x, or if I could get them to fit somehow, 7 at 8x. I feel 13 cores, 1 @16x and then 12 @1x outperforms any other option...
There is a tradeoff with scene navigation (slower choppier movement as res increases), but compilation on an animation, I found is rather instantaneous either way, so the draw back is not so bad.
Smicha has the highest performing card count on the mobo at 8x (11 1080Ti watercooled...).
The board Tutor references gives you the most PCI slots, that is the key with that board, it actually gives the 8x access for each and every card

I don't think the 8x vs 16x difference, will be as noticeable as a 1x vs 16x, that's for sure.
But in the case of 8x vs 16x, I don't know myself, from lack of experience, if there is even a perceptible, or measurable, difference.
Tutor, what has been your experience 8x vs 16x, anything, or not so much?

BTW - This ASRock board is all 13 at 1x. For the past few years I communicated with their reps and support as I maximized my own GPU core count to 13, and that I could never get past 13.
Then they release this new board with...Lucky 13....and I was like....Hmmmm....Co-winky-dink?

If they had gotten it up to 14, I would be hounding them night and day, dusk and dawn.

Thu Nov 02, 2017 3:11 am

The Illusion of Life*/. Animation is the illusion of life where it takes between 24 to about 30 frames (depending on format) to portray just one second of the illusion, and 60 times that number of frames to create just one minute of the illusion, etc., etc.

Rik wrote: Afraid I'm a little confused with Pcie x1, x8, x16 etc. Is x8 a bottleneck compared to x16?
From what I understand, these are the speeds at which information can go from the CPU to GPUs.
So whilst this won't increase the pretty 'colouring in' part of rendering process, it will affect the length of time taken for the black screen of misery that is 'scene evaluation'

Scene evaluation time does delay the beginning of “the pretty colouring in.” x8 is one-half the transport speed of x16, but x8 is eight times the transport speed of x1. Of course, x4 is four times the transport speed of x1.

Rik wrote: 95% of the time, when I hit the 'open Octane viewport' it's just for a quick test render. Once the colouring in has started it only takes a few seconds to see what I need to (once the 'scene evaluation' misery has finally finished.)

I think I'm trying to say that the bottleneck for me is the 'scene evaluation ' process.

What's the best hardware config for this, and is Pcie x16 going to makes a difference?

I feel much like you do about that pesky 'scene evaluation' misery, but on steroids. But whether an x16 or an x8 or an x4 transport is best depends on your usage and in particular whether you animate and for what target audience. I animate for large displays, now mainly 4k, and a typical animation project for me involves rendering thousands of frames. Wikipedia at https://en.m.wikipedia.org/wiki/PCI_Express has an excellent article on the subject. The article has a chart on the top right side of the page that helps to summarize the relevant transport speed difference between a v1.x, v2.x, v3.x, v4.x and v5.x motherboard’s PCIe connectors with a single-lane (x1) vs. a 16-lane (x16) connector. The transport speed for a v1.x with an x1 connector is 250 MB/s vs. 4 GB/s for an x16 card in an x16 connector on that same motherboard. Most of my motherboards (except for those v1.x's in my three 2007 MacPros) are v3.x's and the remainder are v2.x's. My v3x systems would have a transport speed of 985 MB/s in an x1 slot and a transport speed of about 7.9 GB/s in an x8 slot [I approximated that figure by dividing 15.75 GB/s (x16) by half]. In sum, the transport speed of an 8x slot is almost eight times that of an 1x slot and, as Wikipedia shows, a 16x slot has twice the transport speed of an 8x slot. The problem for me is that the Supermicro motherboards with lots of x16 slots cost more than I want to pay.

At the bottom of that Wikipedia article is a PCI Express Link Performance chart which eagerly awaits review because it contains the in-between configurations' throughput analysis also.

Rik wrote: For the other 5% of renders, the high res, many samples ones, I know it's going to take 20 minutes for the colouring in but that's fine as my dog needs a walk.

quick test render (eg is the sun in correct position) 70 seconds scene evaluation, 5 seconds colouring in required (annoying ratio)
final production render = 70 seconds scene evaluation, 20 minutes colouring in required

Currently, I have over 144 GPU processors in my render farm, most of which are pre-Pascal. On average I have 7 GPU processors per system (and it’s a loose average because some systems have 2 GPU processors and some have a lot more, plus I move my GPUs around as the need arises). I don’t know for sure whether my system loads the entire animation into the GPU only once at the start of a render, but I seriously doubt it because of the system pauses between frame renders. I run Octane, Redshift, Thea and FurryBall GPU rendering softwares. But as the following food for problem discussion (and the articles) show, that isn’t determinative:

1) A 30 minute 4k animation running at/near 30 frames/sec. wouldn’t fit entirely in any of my GPUs despite the fact that I created the animation on those very same GPUs. All of the rendered files, together, are at least, and more often than not, are >11,250 GB;

2) See also, https://gamedev.stackexchange.com/quest ... -data-flow (from the app developer's point of view); and

3) See especially what the Octane user encountered at https://discourse.mcneel.com//t/octane- ... ding/28702 (from the Octane user's point of view).

*/Thanks to this great book from Ollie Johnston and Frank Thomas then at Disney Studios.

Thu Nov 02, 2017 4:39 am

Painting on a slant.

Notiusweb wrote:The render speeds themselves are slower, Tutor actually painted the idea nicely:
@1x a card runs like a prior gen version of the same card.

Notiusweb,
Thanks again for the Iclone lead. Additionally, when you've got a few moments to spare, take a look at this site - https://en.m.wikipedia.org/wiki/PCI_Express - which I've referenced above. There's future, as well as past, painting going on towards the bottom of that Wikipedia article in that PCI Express Link Performance chart. It can be seen if you look at a block to the right, then compare it to the immediately preceding block down just one level and to the left. The transport speeds of next gen PCIe connectors should transport data to GPUs about as fast as the prior gen of the "last faster" one up and to the immediate right, i.e., PCIe generational transport speed doubling has been and will likely continue to be at play. For us animators, that's a good thing.

Thu Nov 02, 2017 8:12 am

Cheers, much useful info there for sure!

Coming to the conclusion that I can't buy a Pcie v.5 just yet

I'm thinking, but may well be wrong...

To minimise the misery of 'scene loading' one needs fastest possible data transfer between CPU and graphics cards

This is currently Pcie v.3 with 16 lanes

Not many m'boards offer more than 2 slots to support this. Think the best I've seen in x16 x16 x8 x8 so I guess the scene loading times would be dragged down to x8

Not many CPUs have enough lanes anyway. Top end Intels have 44 lanes, top end AMDs have 64. Some of these lanes might be used by other components too, making it less likely you'll get multiple cards fed by full 16 lanes.

So it's all a bit of a balancing act between CPU lanes, motherboard slots and graphics card lanes & cores.

Too many variables!!!!!!!! Time for breakfast

Thu Nov 02, 2017 10:34 am

Hadn't seen this before...

https://www.pugetsystems.com/labs/artic ... rison-790/

Unfortunately they don't give times for scene loading at x8 vs x16.

Having scoured the forums it seems you need

a mucho fast Ghz CPU processor (multiple cores not really important) with plenty of Pcie lanes for the initial scene loading/calculation (seems 4 cores is the way to go - thanks Glimpse)

a mobo with plenty of Pcie lanes capable of Pcie 3.0 x8 x8 x8 x8 (x16 doesn't seem to help)

4 of the best cards you can afford/fit

Is this far off?

Thinking now however that I'm getting hung up on Pcie lanes when the real bottleneck (for me at least) is the time taken on scene evaluation which is down to the CPU. (Ahem I'm using AMD Phenom X6 , nearly 8 years old

)

Sun Nov 12, 2017 5:20 am

Rik wrote:Hadn't seen this before...

https://www.pugetsystems.com/labs/artic ... rison-790/

Unfortunately they don't give times for scene loading at x8 vs x16.

Having scoured the forums it seems you need

a mucho fast Ghz CPU processor (multiple cores not really important) with plenty of Pcie lanes for the initial scene loading/calculation (seems 4 cores is the way to go - thanks Glimpse)

a mobo with plenty of Pcie lanes capable of Pcie 3.0 x8 x8 x8 x8 (x16 doesn't seem to help)

4 of the best cards you can afford/fit

Is this far off?

Thinking now however that I'm getting hung up on Pcie lanes when the real bottleneck (for me at least) is the time taken on scene evaluation which is down to the CPU. (Ahem I'm using AMD Phenom X6 , nearly 8 years old )

You aren't far off. Just keep in mind that scene evalution time can very even on the same system - scene evaluation time is scene dependent. Scene evaluation time can be an occasional noticeable bother when rendering just one frame; but scene evaluation times becomes a really big regular pain if you're rendering 4k or larger format animations. That's why I render my big jobs on my X9DRX systems and am moving away from Amfeltec GPU Oriented Splitters on those systems running Octane V3. Each of my X9DRX systems have ten x8 slots and one x4 slot. On them, my scene evalution times are hardly noticeable now. On each system, I'm running 5 GPU cards on low profile x8 to x16 riser cards, and on each such system 5-6 more GPU cards on x16 to x16 riser cables - with those riser cables on low profile x8 to x16 riser cards.