Our experience (and solution) with Render Failure/Cuda error
Posted: Tue Oct 30, 2018 9:21 am
Our company have spent some times of last week trying to reduce or resolve problems of Render Failure and Cuda error that frequently occurred on heavvy scenes.
we have tested problematic scenes on different machines with different configuration.
in particular one scene 8probably with some materials defect) give problems on almost all the machine (is the scene that I sent you Ahmet).
Here the result of our "investigations":
1) some hardware of the machines are most strong and reliable than others. In particular these aspect are really important building a Pc:
- power supply (obviusly) choose a good one with sufficient power for all the GPU (we prefer Corsair HCI series). Check if your motherboard is provided of a supplementary PCI connector for multiple GPU installation and ALWAYS connect it!
Our Asus X99 E WS and also AsRock Z97 Extreme 9 have this kind of connector.
- cooling system: our experience tells that water cooling (where is possible to apply) is really a good choice and allow to install the GPU directly on the motherboard (other aspect that is important)
- chech if on your machine the File Paging is correclty configurated. On one machine that constantly gives Cuda error we have found that the File Paging were missed. Reconfigurated it and the cuda error goes away immediatly.
2) The most important thing, that here have resolved (at the moment) all our Render Failure problem, is to modifiy the TDR parameters. We have choosed to set TDR delay to 20 and TDR Level to 0. We think that only one should be the solution (and in all probablity is the first one), but in all the machine modifying these parameters the error goes away immediatly.
Here the details:
Type "regedit" and modifiy the Windows Register :
TdrLevel - Specifies the initial level of recovery. The default is to recover on timeout, which is represented by value 3.
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
KeyValue : TdrLevel
ValueType : REG_DWORD
ValueData : 0 - Detection disabled OR 3 - Recover on timeout we have setted it to 0
TdrDelay - Specifies the number of seconds that the GPU can delay the preempt request from the GPU scheduler. This is effectively the timeout threshold. The default value is 2 seconds.
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
KeyValue : TdrDelay
ValueType : REG_DWORD
ValueData : Number of seconds to delay; 2 seconds is the default we have setted it to 20
So, hoping to help other user, this is our experience.
we have tested problematic scenes on different machines with different configuration.
in particular one scene 8probably with some materials defect) give problems on almost all the machine (is the scene that I sent you Ahmet).
Here the result of our "investigations":
1) some hardware of the machines are most strong and reliable than others. In particular these aspect are really important building a Pc:
- power supply (obviusly) choose a good one with sufficient power for all the GPU (we prefer Corsair HCI series). Check if your motherboard is provided of a supplementary PCI connector for multiple GPU installation and ALWAYS connect it!
Our Asus X99 E WS and also AsRock Z97 Extreme 9 have this kind of connector.
- cooling system: our experience tells that water cooling (where is possible to apply) is really a good choice and allow to install the GPU directly on the motherboard (other aspect that is important)
- chech if on your machine the File Paging is correclty configurated. On one machine that constantly gives Cuda error we have found that the File Paging were missed. Reconfigurated it and the cuda error goes away immediatly.
2) The most important thing, that here have resolved (at the moment) all our Render Failure problem, is to modifiy the TDR parameters. We have choosed to set TDR delay to 20 and TDR Level to 0. We think that only one should be the solution (and in all probablity is the first one), but in all the machine modifying these parameters the error goes away immediatly.
Here the details:
Type "regedit" and modifiy the Windows Register :
TdrLevel - Specifies the initial level of recovery. The default is to recover on timeout, which is represented by value 3.
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
KeyValue : TdrLevel
ValueType : REG_DWORD
ValueData : 0 - Detection disabled OR 3 - Recover on timeout we have setted it to 0
TdrDelay - Specifies the number of seconds that the GPU can delay the preempt request from the GPU scheduler. This is effectively the timeout threshold. The default value is 2 seconds.
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
KeyValue : TdrDelay
ValueType : REG_DWORD
ValueData : Number of seconds to delay; 2 seconds is the default we have setted it to 20
So, hoping to help other user, this is our experience.