PCIE lanes Questions
Forum rules
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
Is there a distinguishable difference when using an 8x slot over a 16x slot, when rendering?
2x Titan Z
1 Geforce 710 (For Displaying to the monitor)
AMD 8350 CPU
32GB Ram
(Will be using the most up to date versions available. Always)
Octane 3
Octane Plugin for C4D R15
Octane Plugin for Houdini 15
https://www.pwndesign.com/
1 Geforce 710 (For Displaying to the monitor)
AMD 8350 CPU
32GB Ram
(Will be using the most up to date versions available. Always)
Octane 3
Octane Plugin for C4D R15
Octane Plugin for Houdini 15
https://www.pwndesign.com/
hi, this is prety simple. there will be no difference in render speed at all. As for slower interface, You will not notice any difference at all. So 8x is more than enough!artech7 wrote:Is there a distinguishable difference when using an 8x slot over a 16x slot, when rendering?
Hope that helps.
Drop a line if You have more questions

Thanks, Glimpse. That answers my question.glimpse wrote:hi, this is prety simple. there will be no difference in render speed at all. As for slower interface, You will not notice any difference at all. So 8x is more than enough!artech7 wrote:Is there a distinguishable difference when using an 8x slot over a 16x slot, when rendering?
Hope that helps.
Drop a line if You have more questions

I apologize if that was asked before. I searched the forums before posting, but to be honest, I didn't search very deep.
2x Titan Z
1 Geforce 710 (For Displaying to the monitor)
AMD 8350 CPU
32GB Ram
(Will be using the most up to date versions available. Always)
Octane 3
Octane Plugin for C4D R15
Octane Plugin for Houdini 15
https://www.pwndesign.com/
1 Geforce 710 (For Displaying to the monitor)
AMD 8350 CPU
32GB Ram
(Will be using the most up to date versions available. Always)
Octane 3
Octane Plugin for C4D R15
Octane Plugin for Houdini 15
https://www.pwndesign.com/
Referring to this old but still topical question:
it is clear that speaking only about rendering time is not so relevant to have a PC with many PCIe 16X slot, but speaking about the every day work, when you are building a scene and have to restart your render 60 times a day , save a minute for every scene reload would mean to save 1 hours/day!
So, my first question: is correct to say that the target should be to have all your GPU on the speedest (16X ) pci slot?
If you have to build a new PC today seems to be hard to find a CPU with a reasonable price (under 1000€) that drives a large amount of lanes (almost 40 or better), that's almost for Intel universe, correct?
Heading to AMD world the things seems to be much better: an AMD Cpu like AMD Ryzen Threadripper 1900X 3.8GHz seems to drive 64 lanes at reasonable price (160€). At this price Intel offer at maximum CPU's that drive 24-28 lanes.
Is this correct or someone have different data/experience to share?
Third (and last) question: could be a valid choice to aim in AMD direction? Or someone have bad experience to tell us?
Thanks a lot.
Luca
it is clear that speaking only about rendering time is not so relevant to have a PC with many PCIe 16X slot, but speaking about the every day work, when you are building a scene and have to restart your render 60 times a day , save a minute for every scene reload would mean to save 1 hours/day!
So, my first question: is correct to say that the target should be to have all your GPU on the speedest (16X ) pci slot?
If you have to build a new PC today seems to be hard to find a CPU with a reasonable price (under 1000€) that drives a large amount of lanes (almost 40 or better), that's almost for Intel universe, correct?
Heading to AMD world the things seems to be much better: an AMD Cpu like AMD Ryzen Threadripper 1900X 3.8GHz seems to drive 64 lanes at reasonable price (160€). At this price Intel offer at maximum CPU's that drive 24-28 lanes.
Is this correct or someone have different data/experience to share?
Third (and last) question: could be a valid choice to aim in AMD direction? Or someone have bad experience to tell us?
Thanks a lot.
Luca
i9-10900x, 96GB DDR4, 2xRTX 2080 TI, ASUS X299 SAGE, Windows 10
http://www.visual4d.it
http://www.visual4d.it
good day, Luca.
CPU lanes matters little and You would likely not notice any difference nor in rendering not in any work during a day between x16 and x8.
You are right saying that AMD CPUs do have a bit more lanes, but there are other solution to look for.
for instance You can get 8700k or 9900k + ASUS WS Z390 PRO that has PLX chip and as an effect is able to provide x8 lanes to each of four GPUs.
Alternatively there are new server CPUs from Intel like W 3223 that offers up to 64 lanes (for around 700$) and with capable motherboard is able to provide enough lanes to feed 7GPUs.
For everyday tasks X8 is enough, but I would focus on CPUs that would end up providing the fastest single core speed, as most applications still rely on it (unless You are into simulations and such).
CPU lanes matters little and You would likely not notice any difference nor in rendering not in any work during a day between x16 and x8.
You are right saying that AMD CPUs do have a bit more lanes, but there are other solution to look for.
for instance You can get 8700k or 9900k + ASUS WS Z390 PRO that has PLX chip and as an effect is able to provide x8 lanes to each of four GPUs.
Alternatively there are new server CPUs from Intel like W 3223 that offers up to 64 lanes (for around 700$) and with capable motherboard is able to provide enough lanes to feed 7GPUs.
For everyday tasks X8 is enough, but I would focus on CPUs that would end up providing the fastest single core speed, as most applications still rely on it (unless You are into simulations and such).
Thank Glimpse for your prompt answer.
About the PLX way: if I understood well PLX chip allow only to use more lanes than the ones that you have in your CPU but paying a price in therm of PCI bandwidth, in other words PLX mix PCIe 3.0 / 2.0 on the base of the slot used but the total bandwidth (in therms of GB/sec) will remain the same.
So, correct me if I 'm wrong, is equal to say that when PLX chip is activated you'll have for example some slots running at 8x but in PCIe 2.0 mode (that is 4x in PCIe 3.0 mode).
If all above is right the AMD way seems to be the best (and economical) solution at the moment.
But I really wish to know your opinion.
Luca
About the PLX way: if I understood well PLX chip allow only to use more lanes than the ones that you have in your CPU but paying a price in therm of PCI bandwidth, in other words PLX mix PCIe 3.0 / 2.0 on the base of the slot used but the total bandwidth (in therms of GB/sec) will remain the same.
So, correct me if I 'm wrong, is equal to say that when PLX chip is activated you'll have for example some slots running at 8x but in PCIe 2.0 mode (that is 4x in PCIe 3.0 mode).
If all above is right the AMD way seems to be the best (and economical) solution at the moment.
But I really wish to know your opinion.
Luca
i9-10900x, 96GB DDR4, 2xRTX 2080 TI, ASUS X299 SAGE, Windows 10
http://www.visual4d.it
http://www.visual4d.it
no I think it's not how it works (or I miss undestood what You say), here is the diagram of WS Z390Pro (at page 111)
all GPUs are seen and works at x8 (gen3), but if You are curious how does it happen, take a look at this article & scrol down to "How Does a PCI Express Switch (like the PEX 8747) Work"
let me copy paste You here:
So what does the PLX chip do on a motherboard? Our best reasoning is that it acts as a data multiplexer with a buffer that organizes a first in, first out (FIFO) data policy for the connected GPUs. Let us take the simplest case, where the PLX chip is powering two GPUs, both at ‘x16’. The GPUs are both connected to 16 lanes each to the PLX chip.
The PLX chip, in hardware, allows the CPU and memory to access the physical addresses of both GPUs. Data is sent to the first GPU only at the bandwidth of 16 lanes. The PLX chip recognizes this, and diverts all the data to the first GPU. The CPU then sends data from memory to the second GPU, and the PLX changes all the lanes to work with the second GPU.
Now let us take the situation where data is needed to be sent to each GPU asynchronously (or at the same time). The CPU can only send this data to the PLX at the bandwidth of 16 lanes, perhaps either weighted to the master/first GPU, or divided equally (or proportionally how the PLX tells the CPU at the hardware level). The PLX chip will then divert the correct proportion of lanes to each GPU. If one GPU requires less bandwidth, then more lanes are diverted to the other GPU.
This ultimately means that in the two-card scenario, at peak throughput, we are still limited to x8/x8. However, in the situation when only one GPU needs the data, it can assign all 16 lanes to that GPU. If the data is traveling upstream from the GPU to the CPU, the PLX can fill its buffer at full x16 speed from each GPU, and at the same time send as much of the data up to the CPU in a continuous stream at x16, rather than switching between the GPUs which could add latency.
This is advantageous – without a PLX chip, the GPUs have a fixed lane count that is modified only by a simple switch when other cards are added. This means in a normal x8/x8 setup that if data is needed by one GPU, the bandwidth is limited to those eight lanes at maximum.
With all this data transference (and that should data be going the other way to memory then the PLX chip will have to have a buffer in order to prevent data loss) the PEX introduces a latency to the process. This is a combination of the extra routing and the action of the PEX to adjust ‘on-the-fly’ as required. According to the PLX documentation, this is in the region of 100 nanoseconds and is combined with large packet memory.
Back in the days of the NF200, we experienced a 1-3% overhead in like-for-like comparisons in many of our game testing. The PEX 8747 chip attempts to promise a reduction in this overhead, especially as it only comes into play in extreme circumstances. The situation is more complex in different circumstances (x16/x8/x8).
all GPUs are seen and works at x8 (gen3), but if You are curious how does it happen, take a look at this article & scrol down to "How Does a PCI Express Switch (like the PEX 8747) Work"
let me copy paste You here:
So what does the PLX chip do on a motherboard? Our best reasoning is that it acts as a data multiplexer with a buffer that organizes a first in, first out (FIFO) data policy for the connected GPUs. Let us take the simplest case, where the PLX chip is powering two GPUs, both at ‘x16’. The GPUs are both connected to 16 lanes each to the PLX chip.
The PLX chip, in hardware, allows the CPU and memory to access the physical addresses of both GPUs. Data is sent to the first GPU only at the bandwidth of 16 lanes. The PLX chip recognizes this, and diverts all the data to the first GPU. The CPU then sends data from memory to the second GPU, and the PLX changes all the lanes to work with the second GPU.
Now let us take the situation where data is needed to be sent to each GPU asynchronously (or at the same time). The CPU can only send this data to the PLX at the bandwidth of 16 lanes, perhaps either weighted to the master/first GPU, or divided equally (or proportionally how the PLX tells the CPU at the hardware level). The PLX chip will then divert the correct proportion of lanes to each GPU. If one GPU requires less bandwidth, then more lanes are diverted to the other GPU.
This ultimately means that in the two-card scenario, at peak throughput, we are still limited to x8/x8. However, in the situation when only one GPU needs the data, it can assign all 16 lanes to that GPU. If the data is traveling upstream from the GPU to the CPU, the PLX can fill its buffer at full x16 speed from each GPU, and at the same time send as much of the data up to the CPU in a continuous stream at x16, rather than switching between the GPUs which could add latency.
This is advantageous – without a PLX chip, the GPUs have a fixed lane count that is modified only by a simple switch when other cards are added. This means in a normal x8/x8 setup that if data is needed by one GPU, the bandwidth is limited to those eight lanes at maximum.
With all this data transference (and that should data be going the other way to memory then the PLX chip will have to have a buffer in order to prevent data loss) the PEX introduces a latency to the process. This is a combination of the extra routing and the action of the PEX to adjust ‘on-the-fly’ as required. According to the PLX documentation, this is in the region of 100 nanoseconds and is combined with large packet memory.
Back in the days of the NF200, we experienced a 1-3% overhead in like-for-like comparisons in many of our game testing. The PEX 8747 chip attempts to promise a reduction in this overhead, especially as it only comes into play in extreme circumstances. The situation is more complex in different circumstances (x16/x8/x8).
one thing to consider is usage.
a lot of times people are trying to get the best they can..but You also have to look into usage: Does not matter if You get 6 or 20 core CPU if You only use 1 core..actually that 6 core might be twice as fast for fracture of price as boost clock on it might be higher. Etc.
The same goes with lanes.. You can have 64 directly from CPU, the question how good You manage to utilize them? are You limited by lanes or Your CPU is not fast enough on that single core to provide enough data to saturate those lanes?
if You are limited by CPU speed and not by connection between CPU and GPUs, then what effectively You end up with is slower and overpriced CPU that makes Your experience worse.
sure, Xeons, Threadrippers, ECC memory have their advantages that influence system stability in general. However for speed, You need to take much more than just a plain lane count.
let me give You simple comparison to drag world (I'm not too much in cars, but this might help You to understand).
You can have compound turbos to boost four cylinder to 3000Hp - that's doable..but what's the point. It's all about the point where the rubber meets the road - if You can not deliver all that power where it is needed, what's the point of having it?
so back to lanes.. You can have X16 for each GPU, but if Your CPU is using single core (that runs slower..) and not saturating those 16lanes, then.. You kind of going to wrong direction.
back to the usage. I don't know what else are You going to use Your machine for, but..if You are doing straight OctaneRender build X8 even through PLX is enough and it is better to invest into faster CPU, more memory, better cooling (that is the most important).
a lot of times people are trying to get the best they can..but You also have to look into usage: Does not matter if You get 6 or 20 core CPU if You only use 1 core..actually that 6 core might be twice as fast for fracture of price as boost clock on it might be higher. Etc.
The same goes with lanes.. You can have 64 directly from CPU, the question how good You manage to utilize them? are You limited by lanes or Your CPU is not fast enough on that single core to provide enough data to saturate those lanes?
if You are limited by CPU speed and not by connection between CPU and GPUs, then what effectively You end up with is slower and overpriced CPU that makes Your experience worse.
sure, Xeons, Threadrippers, ECC memory have their advantages that influence system stability in general. However for speed, You need to take much more than just a plain lane count.
let me give You simple comparison to drag world (I'm not too much in cars, but this might help You to understand).
You can have compound turbos to boost four cylinder to 3000Hp - that's doable..but what's the point. It's all about the point where the rubber meets the road - if You can not deliver all that power where it is needed, what's the point of having it?
so back to lanes.. You can have X16 for each GPU, but if Your CPU is using single core (that runs slower..) and not saturating those 16lanes, then.. You kind of going to wrong direction.
back to the usage. I don't know what else are You going to use Your machine for, but..if You are doing straight OctaneRender build X8 even through PLX is enough and it is better to invest into faster CPU, more memory, better cooling (that is the most important).