Postby abstrax » Wed Dec 09, 2015 3:49 am
Overhaul of the integration kernels
Since the beginning of Octane the integration kernels had one CUDA thread calculate one complete sample. We changed this for various reasons, the main one being the fact that the integration kernels got really huge and impossible to optimize. Also OSL and OpenCL are pretty much impossible to implement this way. To solve the problem, we split the big task of calculating a sample into smaller steps which are then processed one by one by the CUDA threads. I.e. there are a lot more kernel calls are happening than in the past.
There are two major consequences coming with this new approach: Octane needs to keep information for every sample that is calculated in parallel between kernel calls, which requires additional GPU memory. And the CPU is stressed a bit more since it has to do more work to do many more kernel launches...
Postby abstrax » Wed Jan 13, 2016 11:14 pm
Yan wrote:
When will you release support for OpenCL?
Thanks!
OpenCL has been on hiatus for quite a while and we will continue working on it after OSL has been done, i.e. it may become available around middle of the year. Please be aware, that there is a chance that we may run into unsurmountable problems and we can't get it working (Octane is not a simple path tracer anymore). I didn't run into any issues during my initial tests, but everybody tells me how bad it is, so we will have to see.
There is also a small possibility that AMD's CUDA implementation as part of AMD's Boltzman initiative could be used instead of an OpenCL implementation, but we will see, when we get there. Most likely it will not be useful for Octane.
If OpenCL is unnatainable, does this untie your hands to optimize the code in favor of the CUDA model? As in, it could be optimized to not have so many kernel calls, leading to less stacked communications with GPU.
I'm not going to say I'm rooting for this necessarily, but I guess I might be rooting for this unnecessarily.