Page 1 of 1
Cards Failing
Posted: Tue Feb 26, 2013 10:52 am
by Coolcat007
Since yesterday I have a new rendering system consisting of the following hardware:
i5 3570
2x EVGA 660ti
12gb ddr3
samsung 840 series ssd
I have SLI disconnected and disabled.
The installed Nvidia driver is 314.07 with CUDA toolkit 5.0
The problem I have is that the cards randomly fail. If i use only 1 at a time I get the same result.
When I then try to close Octane I get a notice of a vga driver hang/recovery.
I have tested octane v1.10 final and the v1.11RC.
Should I install an older Nvidia driver? This is a very annoying problem.
Re: Cards Failing
Posted: Tue Feb 26, 2013 11:28 am
by Coolcat007
I tested a little more, and it seems the problem only happens when I use my second card.
Are there any known problems where only the main card used by the OS works without a problem?
Otherwise I will need to switch out both cards and I want to delay that as much as I can.
Re: Cards Failing
Posted: Tue Feb 26, 2013 1:27 pm
by kavorka
did you look into the log file?
(help > open log)
Might give you some insight, or allow a dev to diagnose.
Other than that, is your VRAM filled? that causes my cards to fail sometimes. Other than that, I don't know.
Did you try running it with just the 2nd card to make sure the card isn't defective?
Re: Cards Failing
Posted: Tue Feb 26, 2013 2:26 pm
by Coolcat007
There seems to be a lot of error 999: unknown internal error
################################################################################
Started logging on 26.02.13 15:19:48
################################################################################
CUDA error 999 on device 0: An unknown internal error occurred.
-> Kernel execution failed (dl)
CUDA device 0: Direct lighting failed
CUDA error 999 on device 0: An unknown internal error occurred.
-> Failed to copy memory to device.
CUDA device 0: Failed to load data of data texture 20 of context 0 onto device
CUDA error 999 on device 0: An unknown internal error occurred.
-> Failed to deallocate device array
CUDA error 999 on device 0: An unknown internal error occurred.
-> Could not get memory info
CUDA device 0: Failed to update daylight data
CUDA error 999 on device 0: An unknown internal error occurred.
-> Failed to allocate device array
CUDA device 0: Failed to reallocate data texture 20 of context 0
CUDA device 0: Failed to update daylight data
CUDA error 999 on device 0: An unknown internal error occurred.
-> Failed to allocate device array
CUDA device 0: Failed to reallocate data texture 20 of context 0
CUDA device 0: Failed to update daylight data
CUDA error 999 on device 0: An unknown internal error occurred.
-> Failed to allocate device array
.. this goes on for quite a while
VRAM is almost empty. This even happens on just a scene with a simple cube
Re: Cards Failing
Posted: Tue Feb 26, 2013 5:54 pm
by badmilk69
Try your system only with your 2nd card plugged, if crash is a faulty card.
Re: Cards Failing
Posted: Tue Feb 26, 2013 9:02 pm
by face_off
I've some investigating into error 999. It "appears" to be a card failure (so it would be great to see if you pull each card out separately, if it's only the one card that gives the error). To recover from 999, you can sometimes reset the card (ie. disable and the enable then in the Cuda Devices window). I often get 999 after Octane shutdown unexpectedly. Rendering to a viewport smaller than 512 x 512 often gets things started again, and then I can increase the rendersize once the card is rendering.
Re: Cards Failing
Posted: Wed Feb 27, 2013 12:27 am
by FooZe
Might pay to just check your power supply wattage and make sure the cables going to the cards are plugged in properly.
Chris.
Re: Cards Failing
Posted: Wed Feb 27, 2013 5:14 am
by face_off
My latest theory on this is that the card has not sufficiently got to the correct operating clock speed on the Core, Memory or Shader clock when the render starts.
If you use TechPowerUp GPU-Z you can see the clock speeds. Load your scene into Octane Standalone, then click on a simple, single node (starts the render), then click pause. If you check GPU-Z (middle tab) you can see the clocks now at their operating levels. If you have a faulty card, this tool may help identify if one of the clocks has not started.
It's only a theory though...
Paul
Re: Cards Failing
Posted: Mon Mar 04, 2013 11:28 am
by Coolcat007
The cards are plugged in properly. When I isolated 1 card in the system I found 1 of the 2 cards failing. I returned it for a new one. Last friday I worked on both cards at the same time, no problem.
When I try to render today, I get 1 card failing again. The new one (in the 2nd PCIE connector).
I saw that one pcie connector is stated as 16X while the other one is displayed as 4X. It is the 4x one that's failing. Could this be the problem?
Re: Cards Failing
Posted: Mon Mar 04, 2013 1:22 pm
by kavorka
did you try switching card positions?
also having just one card, but put it in the 4x slot?