Compiling scene time not related only to the PCIe speed?

Mon Feb 08, 2016 9:19 am

glimpse wrote:
Tutor wrote:Update: my 32-core system compiles the test scene in Octane V2.25 in 1 min. 28 secs.
I don't know why I start laughing..again after reading this.. =DDD..this is insane..

I just checked it in 2.25 - same results as in 3.04 - around 55-57 seconds on 2600k.

Mon Feb 08, 2016 9:56 am

smicha wrote: I just checked it in 2.25 - same results as in 3.04 - around 55-57 seconds on 2600k.

Yeah, mine is simillar as well.

Mon Feb 08, 2016 11:12 am

glimpse wrote:
smicha wrote: I just checked it in 2.25 - same results as in 3.04 - around 55-57 seconds on 2600k.
Yeah, mine is simillar as well.

Same

Mon Feb 08, 2016 3:18 pm

Well that is just normal. My dual xeon E5-2687W with 16 physical cores starts rendering in 1:13 minutes.

2600k is 3.4 GHz - 3.8 GHz boost and mine is 3.1 Gz - 3.8 GHz boost and since the compilation of the scene is not very multi threaded a speed of a single core has more influence then the number of cores in the system.

Mon Feb 08, 2016 7:53 pm

I don't have time to read the full thread, but let me clear up one thing:

Scene compilation is done to 100% on the CPU - the GPUs are for rendering only. That means, PCIe speeds are not relevant here. Of course, the time until the first image appears includes the, scene upload time, actual render time for the first sample/pixel and the tone map time. Depending on the scene and resolution you can ignore upload (how much data needs to be transferred to the GPUs?), rendering (how fast does it render?) and tone mapping (how big is the image you are rendering?) or you cannot. Just looking at the CPU usage gives you a good indication when the scene compilation has finished.

Anyhow, we are currently looking into the compilation times and I hope we can improve things there. In the next few weeks we will know more.

Mon Feb 08, 2016 7:59 pm

abstrax wrote:Scene compilation is done to 100% on the CPU - the GPUs are for rendering only. That means, PCIe speeds are not relevant here. Of course, the time until the first image appears includes the, scene upload time, actual render time for the first sample/pixel and the tone map time. Depending on the scene and resolution you can ignore upload (how much data needs to be transferred to the GPUs?), rendering (how fast does it render?) and tone mapping (how big is the image you are rendering?) or you cannot. Just looking at the CPU usage gives you a good indication when the scene compilation has finished.

Marcus,

The problem is that only 4 cored cpu (such as old 2600k) is stressed fully to 100% (not all the time) and other massive xeons gpus are not using their all power - compiling (not talking about scene upload) on 2600k takes 2x shorter than on 4 Xeons with 32 cores. This is the problem.

Mon Feb 08, 2016 8:20 pm

smicha wrote:
abstrax wrote:Scene compilation is done to 100% on the CPU - the GPUs are for rendering only. That means, PCIe speeds are not relevant here. Of course, the time until the first image appears includes the, scene upload time, actual render time for the first sample/pixel and the tone map time. Depending on the scene and resolution you can ignore upload (how much data needs to be transferred to the GPUs?), rendering (how fast does it render?) and tone mapping (how big is the image you are rendering?) or you cannot. Just looking at the CPU usage gives you a good indication when the scene compilation has finished.
Marcus,

The problem is that only 4 cored cpu (such as old 2600k) is stressed fully to 100% (not all the time) and other massive xeons gpus are not using their all power - compiling (not talking about scene upload) on 2600k takes 2x shorter than on 4 Xeons with 32 cores. This is the problem.

That's because you can't parallelize scene compilation indefinitely. There are most likely some bottlenecks here and we will try to solve them in the coming weeks, but the fundamental problem is that more CPU cores means that you have to split your work into independent tasks that can run fully in parallel which is not always possible. Often CPUs with more cores, run the cores with a lower clock rate, i.e. with less performance per core. If your algorithms doesn't run very well in parallel, those CPUs will run slower than CPUs with less cores but higher clock rates.

Again, the last time we have spent resources on scene compilation is a few years ago, so I expect that we can improve things here, and we are currently looking into it.

Mon Feb 08, 2016 8:21 pm

Thank you, Marcus.

Mon Feb 08, 2016 10:53 pm

I expect (ie not surprised) that anyone who was focused on CPU rendering in the past 10 years should have, or should be rethinking the way they put their systems together (I think about system architecture limitations every time I build).

Intel Marketing aside; (past 10 years) there are only a few very specific system (build/uses) which gain benefit from a slower, wider access to the CPU (more cores) vs a faster, narrower access to the CPU. In my systems this is due to all the single-thread constrained processes that I need executed.
As you could imagine these builds/uses that genuinely take advantage are predominantly handling 10-100s of individual I/O requests per second... ie. more than one person can generate.
The first exception to this is rendering; where parallelisation has been around for a long time and a single user can initial a massively wide workload for up to 36 cores (generally in Windows). This rendering workload width part should come as no surprise to all of us; since we have all chosen GPU (100's/1000/s of cores @ ~1ghz) vs CPU (up to 36 cores in most windows apps @ 2.5-3.5ghz)...
If you previously bought more CPU cores its likely because you genuinely needed them for CPU rendering; and either didn't know you were taking a speed-hit on all your single-thread constrained operations, or were happy with the compromise (a few seconds longer on operations all day to shorten that 20hr render time by half - would have made a lot of sense to you).

Revit has only recently become multi-thread aware, and its been a gradual drip-feed process to migrate processes to achieve parallelisation. My understanding that for the majority of its history basic vector drawing has been a single-threaded operation, so it makes sense that any of us using CAD bases software will benefit from the higher speed, narrower access to the CPU resources. For me, the few seconds all day long are more important since I wasn't doing any 20hr renders. I've been building my systems to take advantage of this as long as I've been using CAD (since the golden Opteron overclocking days). This meant buying for clock speed (inc via Overclock), not # of cores.

Back in the (recent: C2D & C2Q) golden days; as touched on by Tutor, the bus/FSB speed was increased to achieve overclock, with the byproduct that anything travelling along the bus increased too; this was generally great on enthusiast builds using overclock capable parts (most of you are using this stuff even if you don't realise). Not only did your processor speed increase; but so does your cache, memory and access to that memory (different to raw memory speed). my Q9550 @ ~4ghz & Twin2x2048-8500(2x 4Dimms) @ ~1400mhz were blazingly fast. I genuinely only upgraded because I wanted the new UEFI functionality not available on my old, beloved FSB beast C2Q motherboard, and I needed more than 8gb RAM and more PCIE lanes. I mention this only as context; overclocked systems are generally faster than their equal GHZ stock clock equivalents. (prob exacerbated by Intel's focus on (power) efficiency and technology for the last couple of iterations)

all the fluff above aside; its no surprise to me that as we compartmentalise the parallel parts of our system workload to the GPU (whose strength is massively parallel computation) that we free up the other parts of our systems to suit the workload that it will be primarily responsible for (ie MOOOORE GHZ for those single-thread or few-thread constrained workloads)

Mon Feb 08, 2016 11:00 pm

I don't understand half of that

)

But the point is that Octane developers are super smart and they can make a code that utilizes 100% CPU power if they want to.

And the benefit will be hours and hours saved not waiting for the scene to compile so I think the benefit is well worth the effort to improve on this.