Compiling scene time not related only to the PCIe speed?

Forums: Compiling scene time not related only to the PCIe speed?
Generic forum to discuss Octane Render, post ideas and suggest improvements.
Forum rules
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB

Compiling scene time not related only to the PCIe speed?

Postby Yambo » Sat Feb 06, 2016 4:26 pm

Yambo Sat Feb 06, 2016 4:26 pm
I'm currently running my 2X Titan X at 16x PCIe speed on my X99 mobo, and i was wondering how much the compiling time of the scene will increase on 8x speed, and if it will be linear? (e.g double the time)
So i prepared a sample "heavy" scene, on my machine (specs below) it took 1:07 minutes to compile the geometry (from the moment i hit the render target button till the moment the image appears)

I sent the scene over to Sebastian (smicha), we both thought the compiling time will be almost double since he running at 8x speed, but no! it was faster! Sebastian quote:
So opening the scene takes 7.5 seconds. When I hit render on render target it takes 54 seconds the image to appear. I have 4x 4.5GHz 2600k - 5 year old cpu. What I noticed in a task manager that around 25th second of compilation the cpu is stressed at 100% on all cores. So the CPU speed may affect the overall time. But your 5930k is 6 core and is faster at stock than my 4 core at 4.5ghz. I expected longer times at mine rig, not 15 seconds faster.


Some notes:
• Video attached with recording of my test
• i did the test on Octane 3.00 and on Octane 3.04 - same speed
• the orbx for the test could be found here: https://goo.gl/nrWqi0 so please feel free to do the same test on your machine
• screencap from GPU-Z attached for my PCIe 16x speed
• My machine didn't stress the CPU to 100% on the test while smicha machine does

Tutor, (Or any other member that running dual xeons) Would you mind to do this test please and post here what your compiling times? (If you confirms that cpu matters in loading geometry it will be a greatest move to go with dual cpus on my upcoming build)

Any thoughts?
Attachments
gpuz.png

[ Play Quicktime file ] 304speed.mp4 [ 9.84 MiB | Viewed 7272 times ]

4x 980ti EVGA | 5930k | Asus X99 E WS 3.1 | corsair 64GB RAM |SSD 500GB system + SSD 2TB working files + 6TB HDD storage WD |
Phanteks Enthoo Primo | 1600W EVGA T2 BLACK | It's the fastest 4x980ti build: http://goo.gl/hYp8e0 :)

https://yambo.me
User avatar
Yambo
Licensed Customer
Licensed Customer
 
Posts: 345
Joined: Tue May 12, 2015 1:37 pm
Location: Tel Aviv, Israel

Re: Compiling scene time not related only to the PCIe speed?

Postby jayroth » Sat Feb 06, 2016 10:31 pm

jayroth Sat Feb 06, 2016 10:31 pm
On my system, it took a little bit more than a minute for the scene to compile. All of my cards are in the the PCI 16x slots.
CaseLabs Mercury S8 / ASUS Z10PE-D8 WS / Crucial 64GB 2133 DDR4 / 2 XEON E5-2687W v3 3.1 GHz / EVGA 1600 P2 / 2 EVGA RTX 2080Ti FTW3 Hybrid/ Cinema 4D

Is it fast? Oh, yeah!
User avatar
jayroth
Licensed Customer
Licensed Customer
 
Posts: 393
Joined: Fri May 28, 2010 7:29 pm
Location: Orange County, CA, USA

Re: Compiling scene time not related only to the PCIe speed?

Postby smicha » Sat Feb 06, 2016 10:38 pm

smicha Sat Feb 06, 2016 10:38 pm
What is the max CPU usage at about w 25th second of compilation
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
smicha
Licensed Customer
Licensed Customer
 
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Re: Compiling scene time not related only to the PCIe speed?

Postby grimm » Sun Feb 07, 2016 12:11 am

grimm Sun Feb 07, 2016 12:11 am
Hope this helps, I was able to run the test scene, but it's right at the limit of my 980. :) Here is my system:

Octane 3 alpha 4
I7 5820K @ 3.8Ghz CPU
32 Gb DDR4 quad channel memory
MSI Raider X99 motherboard
Linux Mint 17.3
Forgot to add that the 980 is in a 8x slot.

It took about 7 seconds to load the scene.
It took about 3 min. and 26 seconds to compile.
The 980 rendered it in 2 min. and 35 seconds.

I took snapshots of my CPU usage as it compiled and rendered:

render-test1.png


render-test2.png


render-test3.png


I didn't see the CPUs get very busy at all. There was that one spike for one cpu but it could have been something else. If it was CPU bound I think we would see a lot more load on the CPUs?

Jason
Last edited by grimm on Sun Feb 07, 2016 12:25 am, edited 1 time in total.
Linux Mint 20 x64 | Nvidia GTX 980 4GB (displays) RTX 2070 8GB| Intel I7 5820K 3.8 Ghz | 32Gb Memory | Nvidia Driver 460.56
User avatar
grimm
Licensed Customer
Licensed Customer
 
Posts: 1321
Joined: Wed Jan 27, 2010 8:11 pm
Location: Spokane, Washington, USA

Re: Compiling scene time not related only to the PCIe speed?

Postby pegot » Sun Feb 07, 2016 12:22 am

pegot Sun Feb 07, 2016 12:22 am
My system has two GTX 780s both running on PCI 8x lanes (one slot is 16x but shares bandwidth when second slot is populated). My time was about 2min. 29 sec. for scene compiling and image appearance. I did not check CPU usage but temps for all 4 cores never went above 56c. Also not sure if this matters on CPU usage, but I forgot to turn off my internet streaming radio and had a few apps open.
Win 10
3.7Ghz i9 10900k / 64GB
ASUS STRIX Z490-E
PSU: PowerSpec 850Wd
RTX 3090 Asus Tuff

Network rendering:
Win 10
4.2Ghz i7 7700k / 64GB
AsRock SuperCarrier
PSU: EVGA 1200w
RTX 3080 Ti EVGA Hybrid
RTX 3080 ASUS Tuff
GTX 1080ti SC Black (wc)
pegot
Licensed Customer
Licensed Customer
 
Posts: 921
Joined: Mon Nov 07, 2011 3:44 am

Re: Compiling scene time not related only to the PCIe speed?

Postby jayroth » Sun Feb 07, 2016 1:39 am

jayroth Sun Feb 07, 2016 1:39 am
Max GPU load at 25 seconds into compile read as 0%. The graph showed some pixel-wide spikes that may have been 10%, but the numeric display read 0%. Version 3.0 Alpha 4
CaseLabs Mercury S8 / ASUS Z10PE-D8 WS / Crucial 64GB 2133 DDR4 / 2 XEON E5-2687W v3 3.1 GHz / EVGA 1600 P2 / 2 EVGA RTX 2080Ti FTW3 Hybrid/ Cinema 4D

Is it fast? Oh, yeah!
User avatar
jayroth
Licensed Customer
Licensed Customer
 
Posts: 393
Joined: Fri May 28, 2010 7:29 pm
Location: Orange County, CA, USA

Re: Compiling scene time not related only to the PCIe speed?

Postby Tutor » Sun Feb 07, 2016 6:02 am

Tutor Sun Feb 07, 2016 6:02 am
Yambo wrote:I'm currently running my 2X Titan X at 16x PCIe speed on my X99 mobo, and i was wondering how much the compiling time of the scene will increase on 8x speed, and if it will be linear? (e.g double the time)
So i prepared a sample "heavy" scene, on my machine (specs below) it took 1:07 minutes to compile the geometry (from the moment i hit the render target button till the moment the image appears)

I sent the scene over to Sebastian (smicha), we both thought the compiling time will be almost double since he running at 8x speed, but no! it was faster! Sebastian quote:
So opening the scene takes 7.5 seconds. When I hit render on render target it takes 54 seconds the image to appear. I have 4x 4.5GHz 2600k - 5 year old cpu. What I noticed in a task manager that around 25th second of compilation the cpu is stressed at 100% on all cores. So the CPU speed may affect the overall time. But your 5930k is 6 core and is faster at stock than my 4 core at 4.5ghz. I expected longer times at mine rig, not 15 seconds faster.


Some notes:
• Video attached with recording of my test
• i did the test on Octane 3.00 and on Octane 3.04 - same speed
• the orbx for the test could be found here: https://goo.gl/nrWqi0 so please feel free to do the same test on your machine
• screencap from GPU-Z attached for my PCIe 16x speed
• My machine didn't stress the CPU to 100% on the test while smicha machine does

Tutor, (Or any other member that running dual xeons) Would you mind to do this test please and post here what your compiling times? (If you confirms that cpu matters in loading geometry it will be a greatest move to go with dual cpus on my upcoming build)

Any thoughts?



It may take me a couple of days to run the scene compile file because I haven't yet installed or tried V3 alpha and I'll be running the test file between my current projects. I do look forward to testing V3 alpha with that complex scene file. Here are my initial thoughts:

WHY THE LASTEST ISN’T ALWAYS THE GREATEST PERFORMER - OR THE RAMBLINGS OF AN OLD MAN

However, before I have the time to run the tests, let me point out some things. Scene compilation is CPU/system memory bound and PCie speed affected function. I’ve been a CPU tweaker/system speed hacker since the mid-1980s. I’ve tweaked Intel, AMD, IBM and Motorola CPUs, modding and substituting CPUs for Atari, Amiga, Apple [ http://www.computerworld.com/article/24 ... -pro-.html ], Windows and Linux systems. I’ve been a Hackintosher [
http://www.insanelymac.com/forum/topic/ ... ch-scores/ ]
I've underclocked hackintoshes and manipulated turboboost stages to achieve the fastest system running OSX [ http://forums.macrumors.com/threads/all ... e.1333421/ ]. I also tweak and mod GPUs. It was and is all done in my pursuit to have the fastest systems that my little duckets could/can buy for my business.

1) A WALK DOWN MEMORY LANE

So with that background, I give you the following little historical tour along Intel’s CPU roadmap and it’s impact on CPU tweaking:
Once upon a time one could overclock a Nehalem (Xeon 3500/5500s) (the tock) and Westmere (Xeon 2600/5600s) (the tick) CPUs and only affect the speed of CPU core and memory speeds. Then overclocking was only limited by the overclocker’s skills, the quality of the CPU and the capacity of the memory to operate at the overclocked speed. The same applied to non-Xeons. If you overclocked the CPU, then the memory was overclocked by the same percentage so you had to ensure that your system memory was up to the task or you had to designate in bios that the memory was of a lesser speed so that when the CPU overclock was applied it would not automatically kick the memory speed beyond it rated limit. Well, Intel didn’t like users having that flexibility. What Intel did, besides locking the Sandy Bridge (and later) Xeons, was to put a whole host of functions, that had been separately configurable via some bioses on some Nehalem and Westmeres motherboards, under the control of DMI. So if you could overclock or underclock the Xeons, it also affected a whole host of things (I call them "the whole family"), such as your system's PCIe frequency, HDs, SSDs, USB, video, etc. That's not a pretty sight for all of these functions to be overclocked to the same extent (other than a very minimal overclock, say .1 to .4 unless you’re very lucky and even if you’re very lucky you have to contend with .75 or higher overclocks being guaranteed to bring your very best system to a halt). On Intel's enthusiast CPUs, you could pay a toll by getting a CPU with a "K" in its model name and for that toll you got "straps" to hold everything else down so you could tweak the CPU and the memory a little more liberally. But Xeons couldn't be strapped down. Ask Sandy Bee Xeon (or any post—Westmere Xeon) out for a date and you have to take the whole family out too for every one of them gets treated the same. Few motherboards designed for any Xeons allow any degree of overclocking. While enthusiast motherboards can handle Xeons, subject to “taking the whole family out for a date,” few professional motherboard makers provide for any overclocking, an important exception being Supermicro’s DAX lineup (otherwise called - “Hyperspeed”) built especially for Day-Traders [ http://www.supermicro.com/SearchToolkit ... t_CSE.aspx ]. Remember also that some snapshots that you may have seen of CPU usage was not that of discrete CPUs, but rather depicts cores within one or more CPUs. Not all cores are subject to the very same stressors. Inner cores get hotter that outer cores and as cores get hotter and reach certain thresholds their speed of operation slows down. So whether a CPU is itself water-cooled matters much. If the load on the CPU is sufficient to tax and is made to take advantage of all of the cores, a snapshot of the cores usage can give you indications of which are most likely the innermost and outermost cores in the package. Moreover, consider that not all tasks need or can even take advantage of all of a CPU’s cores. Could scene compilation be one such task? To be sure, more CPUs with sufficient ram provide an environment for more IO space being available to support more GPUs (and my observation is that newer, more powerful GPUs require more IO space than do older GPUs), but is there a sweet spot? I do not now know where that spot is right now, but I strongly believe that the spot will continue to move as GPUs become more formidable and complex.

2) SOMETIMES, THE QUICK AND DIRTY ANSWER MIGHT BE BOUND TO THE TINIEST STALK IN THE BALE OF ROTTING HAY

For those of you who have read this far and might be asking, “Tutor, why drag us down Memory Lane?” - I respond, “By knowing what has happened on memory lane might give you better clues about what we’re observing. So what else might be happening when a shiny new 5930k 6-core
[ http://www.cpu-world.com/CPUs/Core_i7/I ... 5930K.html -
Standard Frequency 3,500 MHz;
Turbo Stage 1 = 3600 MHz (3 cores or more);
Turbo Stage 2 = 3700 MHz (1 or 2 cores)]
which may appear, at first blush, to be faster at stock than someone else’s "old" 4-core 2600k
[ http://www.cpu-world.com/CPUs/Core_i7/I ... 33908.html -
Standard Frequency 3,400 MHz;
Turbo Stage 1 = 3500 MHz (4 cores);
Turbo Stage 2 = 3600 MHz (3 cores);
Turbo Stage 3 = 3700 MHz (2 cores);
Turbo Stage 4 = 3800 MHz (1 core)]
overclocked to run at 4.5 GHz (AND unless Turbo gets turned off, all of those stages get bumped up accordingly with over clocking, i.e.,
Turbo Stage 1 = 4600 MHz (4 cores);
Turbo Stage 2 = 4700 MHz (3 cores);
Turbo Stage 3 = 4800 MHz (2 cores);
Turbo Stage 4 = 4900 MHz (1 core)]
gets beat by that "old" 2600k?
It might be a symptom or outcome of the breath of what overclocking’s myriad of effects can bring about. In sum, at what speed are all of the other variables that affect the desired outcome (aka - fast/short compilation time) being done.

P.S. Just for the heck of it, compare the CPU speed ratios between the two systems' CPUs and compare it to the compilation time delta. Or, even better yet, overclock that newer CPU to the same extent as that older CPU and let us know the outcome.
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.
User avatar
Tutor
Licensed Customer
Licensed Customer
 
Posts: 531
Joined: Tue Nov 20, 2012 2:57 pm
Location: Suburb of Birmingham, AL - Home of the Birmingham Civil Rights Institute

Re: Compiling scene time not related only to the PCIe speed?

Postby Yambo » Sun Feb 07, 2016 7:20 am

Yambo Sun Feb 07, 2016 7:20 am
Tutor, Thanks a lot for the detailed replay (As always), i will need to read it few more times till i'll understand some of it, but that's great.

As i see it we are getting here very interesting results. i would love to see some more detailed results (including CPU performance) from other forum members.
Maybe one of the guys in OTOY could put some light on the way it "should" work, because we can definitely see it not only related to the PCIe speed. (and that's what i and probably many other thought)

Thanks.
4x 980ti EVGA | 5930k | Asus X99 E WS 3.1 | corsair 64GB RAM |SSD 500GB system + SSD 2TB working files + 6TB HDD storage WD |
Phanteks Enthoo Primo | 1600W EVGA T2 BLACK | It's the fastest 4x980ti build: http://goo.gl/hYp8e0 :)

https://yambo.me
User avatar
Yambo
Licensed Customer
Licensed Customer
 
Posts: 345
Joined: Tue May 12, 2015 1:37 pm
Location: Tel Aviv, Israel

Re: Compiling scene time not related only to the PCIe speed?

Postby smicha » Sun Feb 07, 2016 9:21 am

smicha Sun Feb 07, 2016 9:21 am
This topic becomes one the most important in terms of Octane performance, especially for those who animate tons of scenes and must recompile every frame. And the results are highly surprising. I think it is worth searching for true reasons what is causing such discrepancies in compilation time, even 4 times.

1. The most striking behavior is (besides it is all 8x on asus ws mobo) that my i7 2600k OC @4.5GHz and constant voltage 1.32V (all power settings in bios are maxed and power saving features are off ) is stressed between 25-40th seconds to 100% on all cores (CPU, not GPU!). If this matters (I read carefully what Tutor wrote) the max temp during compilation on all cores is 45-55C (CPU temp, not GPU - bear in mind that this is a watercooled machine). The overall time of compilation is 55 seconds which compared:

to yam - 18% faster (but yam's GPUs are at 16x, on 5930k at 3.7GHz on 6 cores) and comparing core clocks 4.5/3.7 gives 21% of difference - but this is not consistent - read further on
to pegot - 2.7x faster (same cards as mine 780 6GB, 4 cores @3.5GHz, i7 3770k) - comparing clocks mine shall be 28% faster, but is 170% faster.
to grimm - 3.75x faster (probably due to 4GB of vram on 980 - the scene takes around 3.6GB, cpu is 5820k- 6 core @3.8GHZ) - do you have OOC texture on ?
to jayroth - 10% faster (I assumed it was 1 minute) but jayroth has 20 cores on dual 2687 at 3.1GHz (is it under load?) and 16x gpus

2. At grimm's diagrams we don't see all cpu activity during over 3 minute period of compilation so it's hard to say what is going on during compilation

3. Looking at Yam's cpu activity it is stressed only to 70% at 30-40th sec.

4. jayroth - I meant CPU activity, not GPU

5. Tutor insights are important - I'll try do down-clock my cpu to e.g. 3GHz and see the difference.

OTOY - please take a look at these numbers. This may impact the way we build our machines and the way Octane is coded.

BTW. I run win 7 and 3.04
Attachments
compile1.jpg
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
smicha
Licensed Customer
Licensed Customer
 
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Re: Compiling scene time not related only to the PCIe speed?

Postby glimpse » Sun Feb 07, 2016 10:56 am

glimpse Sun Feb 07, 2016 10:56 am
here's my fast take on this experiment (1st go)

scene load time ~15sec (give or take one or two),
compiling ~1min & 11sec (plus/minus 1sec).

I ran TitanBlack (at 8x as other PCIe slot is ocupated as well) on Z77 based mobo with stock 3570k (no OC).

edit: so, I open my box, seated TitanBlack in 16x & fired this scene again - rusult? the same, completelly, from what I can say in this situation my CPU was the part to blame (4cores, no HT, etc..)

Then I fired up ASUS auto tweaker ('cos I did not want to mess with manual OC right now),
after reboot this package gave 24% OC on CPU & a bit 13% on GPU - let's ignore the later.. as it modest to say the least..=)

so I've tried to run the scene again & here are rusults:

11 sec to open
1min & 1 sec to compile.

things to take away. We know for now already that CPU speed would not influence rendering itself, but it will upload/compile speeds.

While compiling scene You see different load levels (25% or 100%) because, there are different workloads to be done & while some of them optimised for multithreaded CPUs (stage where core count matters) ..other still relly on single core performance (that's where speed matters)..

..to have good performance You need multiple things nicelly balanced =)

Confusion about lanes (from my perspective) - lane count does not equal performance. I can try to explain if someOne is interested, but..I'll skip this for now.
Last edited by glimpse on Sun Feb 07, 2016 12:08 pm, edited 2 times in total.
User avatar
glimpse
Licensed Customer
Licensed Customer
 
Posts: 3715
Joined: Wed Jan 26, 2011 2:17 pm
Next

Return to General Discussion


Who is online

Users browsing this forum: No registered users and 20 guests

Tue Apr 23, 2024 10:51 am [ UTC ]