Compiling scene time not related only to the PCIe speed?

Generic forum to discuss Octane Render, post ideas and suggest improvements.
Forum rules
Please add your OS and Hardware Configuration in your signature, it makes it easier for us to help you analyze problems. Example: Win 7 64 | Geforce GTX680 | i7 3770 | 16GB
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

glimpse wrote:
Tutor wrote:Update: my 32-core system compiles the test scene in Octane V2.25 in 1 min. 28 secs.
I don't know why I start laughing..again after reading this.. =DDD..this is insane..
I just checked it in 2.25 - same results as in 3.04 - around 55-57 seconds on 2600k.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
glimpse
Licensed Customer
Posts: 3740
Joined: Wed Jan 26, 2011 2:17 pm
Contact:

smicha wrote: I just checked it in 2.25 - same results as in 3.04 - around 55-57 seconds on 2600k.
Yeah, mine is simillar as well.
User avatar
Yambo
Licensed Customer
Posts: 345
Joined: Tue May 12, 2015 1:37 pm
Location: Tel Aviv, Israel

glimpse wrote:
smicha wrote: I just checked it in 2.25 - same results as in 3.04 - around 55-57 seconds on 2600k.
Yeah, mine is simillar as well.
Same
4x 980ti EVGA | 5930k | Asus X99 E WS 3.1 | corsair 64GB RAM |SSD 500GB system + SSD 2TB working files + 6TB HDD storage WD |
Phanteks Enthoo Primo | 1600W EVGA T2 BLACK | It's the fastest 4x980ti build: http://goo.gl/hYp8e0 :)

https://yambo.me
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

Well that is just normal. My dual xeon E5-2687W with 16 physical cores starts rendering in 1:13 minutes.

2600k is 3.4 GHz - 3.8 GHz boost and mine is 3.1 Gz - 3.8 GHz boost and since the compilation of the scene is not very multi threaded a speed of a single core has more influence then the number of cores in the system.
User avatar
abstrax
OctaneRender Team
Posts: 5506
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

I don't have time to read the full thread, but let me clear up one thing:

Scene compilation is done to 100% on the CPU - the GPUs are for rendering only. That means, PCIe speeds are not relevant here. Of course, the time until the first image appears includes the, scene upload time, actual render time for the first sample/pixel and the tone map time. Depending on the scene and resolution you can ignore upload (how much data needs to be transferred to the GPUs?), rendering (how fast does it render?) and tone mapping (how big is the image you are rendering?) or you cannot. Just looking at the CPU usage gives you a good indication when the scene compilation has finished.

Anyhow, we are currently looking into the compilation times and I hope we can improve things there. In the next few weeks we will know more.
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

abstrax wrote:Scene compilation is done to 100% on the CPU - the GPUs are for rendering only. That means, PCIe speeds are not relevant here. Of course, the time until the first image appears includes the, scene upload time, actual render time for the first sample/pixel and the tone map time. Depending on the scene and resolution you can ignore upload (how much data needs to be transferred to the GPUs?), rendering (how fast does it render?) and tone mapping (how big is the image you are rendering?) or you cannot. Just looking at the CPU usage gives you a good indication when the scene compilation has finished.
Marcus,

The problem is that only 4 cored cpu (such as old 2600k) is stressed fully to 100% (not all the time) and other massive xeons gpus are not using their all power - compiling (not talking about scene upload) on 2600k takes 2x shorter than on 4 Xeons with 32 cores. This is the problem.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
User avatar
abstrax
OctaneRender Team
Posts: 5506
Joined: Tue May 18, 2010 11:01 am
Location: Auckland, New Zealand

smicha wrote:
abstrax wrote:Scene compilation is done to 100% on the CPU - the GPUs are for rendering only. That means, PCIe speeds are not relevant here. Of course, the time until the first image appears includes the, scene upload time, actual render time for the first sample/pixel and the tone map time. Depending on the scene and resolution you can ignore upload (how much data needs to be transferred to the GPUs?), rendering (how fast does it render?) and tone mapping (how big is the image you are rendering?) or you cannot. Just looking at the CPU usage gives you a good indication when the scene compilation has finished.
Marcus,

The problem is that only 4 cored cpu (such as old 2600k) is stressed fully to 100% (not all the time) and other massive xeons gpus are not using their all power - compiling (not talking about scene upload) on 2600k takes 2x shorter than on 4 Xeons with 32 cores. This is the problem.
That's because you can't parallelize scene compilation indefinitely. There are most likely some bottlenecks here and we will try to solve them in the coming weeks, but the fundamental problem is that more CPU cores means that you have to split your work into independent tasks that can run fully in parallel which is not always possible. Often CPUs with more cores, run the cores with a lower clock rate, i.e. with less performance per core. If your algorithms doesn't run very well in parallel, those CPUs will run slower than CPUs with less cores but higher clock rates.

Again, the last time we have spent resources on scene compilation is a few years ago, so I expect that we can improve things here, and we are currently looking into it.
Last edited by abstrax on Mon Feb 08, 2016 8:23 pm, edited 1 time in total.
Reason: added some more clarifications
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
User avatar
smicha
Licensed Customer
Posts: 3151
Joined: Wed Sep 21, 2011 4:13 pm
Location: Warsaw, Poland

Thank you, Marcus.
3090, Titan, Quadro, Xeon Scalable Supermicro, 768GB RAM; Sketchup Pro, Classical Architecture.
Custom alloy powder coated laser cut cases, Autodesk metal-sheet 3D modelling.
build-log http://render.otoy.com/forum/viewtopic.php?f=9&t=42540
prehabitat
Licensed Customer
Posts: 495
Joined: Fri Aug 16, 2013 10:30 am
Location: Victoria, Australia

I expect (ie not surprised) that anyone who was focused on CPU rendering in the past 10 years should have, or should be rethinking the way they put their systems together (I think about system architecture limitations every time I build).

Intel Marketing aside; (past 10 years) there are only a few very specific system (build/uses) which gain benefit from a slower, wider access to the CPU (more cores) vs a faster, narrower access to the CPU. In my systems this is due to all the single-thread constrained processes that I need executed.
As you could imagine these builds/uses that genuinely take advantage are predominantly handling 10-100s of individual I/O requests per second... ie. more than one person can generate.
The first exception to this is rendering; where parallelisation has been around for a long time and a single user can initial a massively wide workload for up to 36 cores (generally in Windows). This rendering workload width part should come as no surprise to all of us; since we have all chosen GPU (100's/1000/s of cores @ ~1ghz) vs CPU (up to 36 cores in most windows apps @ 2.5-3.5ghz)...
If you previously bought more CPU cores its likely because you genuinely needed them for CPU rendering; and either didn't know you were taking a speed-hit on all your single-thread constrained operations, or were happy with the compromise (a few seconds longer on operations all day to shorten that 20hr render time by half - would have made a lot of sense to you).

Revit has only recently become multi-thread aware, and its been a gradual drip-feed process to migrate processes to achieve parallelisation. My understanding that for the majority of its history basic vector drawing has been a single-threaded operation, so it makes sense that any of us using CAD bases software will benefit from the higher speed, narrower access to the CPU resources. For me, the few seconds all day long are more important since I wasn't doing any 20hr renders. I've been building my systems to take advantage of this as long as I've been using CAD (since the golden Opteron overclocking days). This meant buying for clock speed (inc via Overclock), not # of cores.

Back in the (recent: C2D & C2Q) golden days; as touched on by Tutor, the bus/FSB speed was increased to achieve overclock, with the byproduct that anything travelling along the bus increased too; this was generally great on enthusiast builds using overclock capable parts (most of you are using this stuff even if you don't realise). Not only did your processor speed increase; but so does your cache, memory and access to that memory (different to raw memory speed). my Q9550 @ ~4ghz & Twin2x2048-8500(2x 4Dimms) @ ~1400mhz were blazingly fast. I genuinely only upgraded because I wanted the new UEFI functionality not available on my old, beloved FSB beast C2Q motherboard, and I needed more than 8gb RAM and more PCIE lanes. I mention this only as context; overclocked systems are generally faster than their equal GHZ stock clock equivalents. (prob exacerbated by Intel's focus on (power) efficiency and technology for the last couple of iterations)

all the fluff above aside; its no surprise to me that as we compartmentalise the parallel parts of our system workload to the GPU (whose strength is massively parallel computation) that we free up the other parts of our systems to suit the workload that it will be primarily responsible for (ie MOOOORE GHZ for those single-thread or few-thread constrained workloads)
Win10/3770/16gb/K600(display)/GTX780(Octane)/GTX590/372.70
Octane 3.x: GH Lands VARQ Rhino5 -Rhino.io- C4D R16 / Revit17
User avatar
BorisGoreta
Licensed Customer
Posts: 1413
Joined: Fri Dec 07, 2012 6:45 pm
Contact:

I don't understand half of that :o)

But the point is that Octane developers are super smart and they can make a code that utilizes 100% CPU power if they want to.

And the benefit will be hours and hours saved not waiting for the scene to compile so I think the benefit is well worth the effort to improve on this.
Post Reply

Return to “General Discussion”