GPU crashing

194 views
Skip to first unread message

mihailp...@gmail.com

unread,
Jan 23, 2018, 4:37:53 AM1/23/18
to Accelerad Users
Hi all,

I just installed Accelerad on my computer, but my GPU keeps crashing. My graphics card is  NVIDIA Quadro P2000, which according to thew CUDA-enabled GPU list has a capability of 6.1. The newest driver was also installed. I also tried some of the tips in the Accelerad documentation page, but it seems I can't get it going. 

Can anyone please help me with that?


Mihail Todorov

unread,
Jan 23, 2018, 8:27:09 AM1/23/18
to Accelerad Users
Does it matter that the Radiance version is 5.1.0? 

Nathaniel Jones

unread,
Jan 23, 2018, 8:58:34 AM1/23/18
to Accelerad Users
Hi Mihail,

The message in the lower right corner of your screen indicates at Windows timeout detection and recovery (TDR) error. The Accelerad documentation lists several steps that you can take to prevent this type of error.

If you edit your registry, note that under windows 10 there are additional registry keys that affect timeout such as HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers\DCI\Timeout not mentioned in Microsoft's documentation.

Nathaniel

Mihail Todorov

unread,
Jan 23, 2018, 9:58:43 AM1/23/18
to Accelerad Users
Hi Nathaniel,

Thanks for replying! 

Before changing the registry I want to know if I have done the other steps correctly.

Do I have to install CUDA and Optix packages separately from Accelerad?

After the first crash, I added -t 1.5 to the radiance parameters (using DIVA in Grasshopper). Correct me if I am wrong. Then it crashed again.

Since I have two monitors connected, I disconnected 1 of them but this did not help. Does it matter if more than 1 monitors are connected?


Mihail

Nathaniel Jones

unread,
Jan 23, 2018, 2:31:49 PM1/23/18
to Accelerad Users
Hi Mikail,

In reply to your questions:

You do not need to install CUDA or OptiX. The relevant libraries are included in the Accelerad installation.

I do not know how DIVA works with regard to custom parameters, but adding the -t argument is usually the first step in dealing with TDR errors. Smaller timeout arguments can be helpful if -t 1.5 is not enough, so you could try -t 1 or -t 0.75. However, if a single call to the kernel takes longer than 2 seconds, then this won't help anyway. In this case, you need to use one of the other solutions mentioned in the documentation. Kernel calls can be shortened by for instance reducing the number of bounces with the -lr parameter.

The number of monitors connected to your system has no effect on TDR.

Nathaniel

Mihail Todorov

unread,
Jan 24, 2018, 12:16:49 PM1/24/18
to Accelerad Users
Hi again Nathaniel,

Thanks for clarifying that for me. 

Unfortunately, I did not succeed to fix the problem on my computer as you and the Accelerad documentation did. The same error continued to occur.

I just tried running an analysis on another computer here at the office, which has a monster GPU and runs on windows 10. I ran the analysis, and the good news is that the card did not crash, but I got a message:

1. Solution exception:Failed to read the results!
OptiX 3.9.1 found driver 9.0.0 and 1 GPU device:

Device 0: Quadro P4000 with 14 multiprocessors, 1024 threads per block, 1480000 kHz, 8589934592 bytes global memory, 128 hardware textures, compute capability 6.1, timeout disabled, Tesla compute cluster driver disabled, cuda device 0.

Geometry build time: 125 milliseconds for 612 objects.

OptiX kernel time: 250 milliseconds (0 seconds).

rpict: ray tracing time: 1860 milliseconds (1 seconds).


Do you know what could the reason for that be?

Mihail

Nathaniel Jones

unread,
Jan 24, 2018, 12:45:14 PM1/24/18
to Accelerad Users
Hi Mihail,

The first line of the output is an error from DIVA. The rest is normal output from Accelerad that does not mention any error, so there's no information here I can use to suggest a solution. You might try running this outside of the DIVA environment to check that the output is being written correctly. Also make sure you are not using the -w parameter which suppresses warning messages.

Nathaniel
Message has been deleted

Efi

unread,
Jan 31, 2018, 3:39:37 PM1/31/18
to Accelerad Users

Hi Nathaniel,

 

I was trying to test Accelerad with Ladybug+Honeybee but, unfortunately, I had a similar error I think. Even though I tried, I cannot figure out the cause for this.  I found some hdr pictures in the working direction but still in grasshopper there are no results. The same grasshopper script works fine with radiance. I also uninstalled DIVA just in case, but still there was the same error message.

Do you know maybe what is the reason for this error?

Please find below this mesage, the error.log together with the “readme!” output.

Thank you in advance.


Efi
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ERROR.log
**************
*** PID 12944: rpict -t 10 -vth -vp -0.3705683167732593 -10.297960533791734 7.760222168529153 -vd 0.4893783893512933 0.77526370980071391 -0.39934317610542502 -vu 0.21316491736225768 0.33769170897123063 0.9168015203401616 -vh 60.000 -vv 60.000 -vs -0.500 -vl -0.500 -x 550 -y 264 -af unnamed_IMG.amb -ps 8 -pt 0.15 -pj 0.6 -dj 0 -ds 0.5 -dt 0.5 -dc 0.25 -dr 0 -dp 64 -st 0.85 -ab 3 -ad 512 -as 128 -ar 16 -aa 0.250 -lr 4 -lw 0.050 -av 0 0 0 -e error.log unnamed_IMG.oct

**************
*** PID 12596: rpict -t 10 -vth -vp -0.3705683167732593 -10.297960533791734 7.760222168529153 -vd 0.4893783893512933 0.77526370980071391 -0.39934317610542502 -vu 0.21316491736225768 0.33769170897123063 0.9168015203401616 -vh 60.000 -vv 60.000 -vs 0.500 -vl -0.500 -x 550 -y 264 -af unnamed_IMG.amb -ps 8 -pt 0.15 -pj 0.6 -dj 0 -ds 0.5 -dt 0.5 -dc 0.25 -dr 0 -dp 64 -st 0.85 -ab 3 -ad 512 -as 128 -ar 16 -aa 0.250 -lr 4 -lw 0.050 -av 0 0 0 -e error.log unnamed_IMG.oct

rpict: 0 rays, 0.00% after 0.0000 hours
rpict: 0 rays, 0.00% after 0.0000 hours
**************
*** PID  5756: rpict -t 10 -vth -vp -0.3705683167732593 -10.297960533791734 7.760222168529153 -vd 0.4893783893512933 0.77526370980071391 -0.39934317610542502 -vu 0.21316491736225768 0.33769170897123063 0.9168015203401616 -vh 60.000 -vv 60.000 -vs -0.500 -vl 0.500 -x 550 -y 264 -af unnamed_IMG.amb -ps 8 -pt 0.15 -pj 0.6 -dj 0 -ds 0.5 -dt 0.5 -dc 0.25 -dr 0 -dp 64 -st 0.85 -ab 3 -ad 512 -as 128 -ar 16 -aa 0.250 -lr 4 -lw 0.050 -av 0 0 0 -e error.log unnamed_IMG.oct

rpict: 0 rays, 0.00% after 0.0000 hours
**************
*** PID 10944: rpict -t 10 -vth -vp -0.3705683167732593 -10.297960533791734 7.760222168529153 -vd 0.4893783893512933 0.77526370980071391 -0.39934317610542502 -vu 0.21316491736225768 0.33769170897123063 0.9168015203401616 -vh 60.000 -vv 60.000 -vs 0.500 -vl 0.500 -x 550 -y 264 -af unnamed_IMG.amb -ps 8 -pt 0.15 -pj 0.6 -dj 0 -ds 0.5 -dt 0.5 -dc 0.25 -dr 0 -dp 64 -st 0.85 -ab 3 -ad 512 -as 128 -ar 16 -aa 0.250 -lr 4 -lw 0.050 -av 0 0 0 -e error.log unnamed_IMG.oct

rpict: 0 rays, 0.00% after 0.0000 hours
OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.

Geometry build time: 588 milliseconds for 612 objects.
OptiX kernel time: 44 milliseconds (0 seconds).
rpict: 69696 rays, 100.00% after 0.0008 hours
OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
[...]
rpict: ray tracing time: 2645 milliseconds (3 seconds).
rpict: ray tracing time: 2925 milliseconds (3 seconds).
rpict: ray tracing time: 2360 milliseconds (2 seconds).
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"READme!" message

Image-based simulation
Current working directory is set to:  C:\Radiance\dokimi\unnamed\imageBasedSimulation\


Failed to read the results!

OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:

Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.
Geometry build time: 588 milliseconds for 612 objects.
OptiX kernel time: 44 milliseconds (0 seconds).
OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
[...]
Runtime error (PythonException): Failed to read the results!
OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.
Geometry build time: 588 milliseconds for 612 objects.
OptiX kernel time: 44 milliseconds (0 seconds).
OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
[...]
rpict: ray tracing time: 2645 milliseconds (3 seconds).
rpict: ray tracing time: 2925 milliseconds (3 seconds).
rpict: ray tracing time: 2360 milliseconds (2 seconds).
Traceback:
  line 357, in script
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Mostapha Sadeghipour

unread,
Jan 31, 2018, 3:51:58 PM1/31/18
to Accelerad Users
Hi Efi,

The log doesn't seem to include any errors. It might be the fact that the log info are not commented out and Honeybee assumes them as error outputs. Can you visualize the HDR file using HDR2TIFF or HDR2GIF components?

-- Mostapha

Efi

unread,
Jan 31, 2018, 5:53:12 PM1/31/18
to Accelerad Users

Hi Mostapha,

 

Thank you for your quick response. I just tried to visualize the hdr with the Honeybee component and it works fine. Even when the path of the hdr was connected to the “glareAnalysis” component, there were successfully some outputs.

I noticed that when I increased the quality of the RadParameters, the error.log changed including maybe info about the specific error.

Below you can find the new error.log.

 

Efi

--------------------------------------------------------------------------------------------------------------------
ERROR.log
**************
*** PID  8960: rpict -t 10 -vth -vp 2.9926443099975586 2.4607629776000977 1.2908339500427246 -vd 0.22062274830588088 0.97407957431054992 0.049945829064473951 -vu 0.0 0.0 1.0 -vh 60.000 -vv 60.000 -vs -0.500 -vl -0.500 -x 550 -y 264 -af 21MAR900_IMG.amb -ps 4 -pt 0.1 -pj 0.9 -dj 0.5 -ds 0.25 -dt 0.25 -dc 0.5 -dr 1 -dp 256 -st 0.5 -ab 4 -ad 2048 -as 2048 -ar 64 -aa 0.200 -lr 6 -lw 0.010 -av 0 0 0 -e error.log 21MAR900_IMG.oct


rpict: 0 rays, 0.00% after 0.0000 hours
**************
*** PID   456: rpict -t 10 -vth -vp 2.9926443099975586 2.4607629776000977 1.2908339500427246 -vd 0.22062274830588088 0.97407957431054992 0.049945829064473951 -vu 0.0 0.0 1.0 -vh 60.000 -vv 60.000 -vs 0.500 -vl -0.500 -x 550 -y 264 -af 21MAR900_IMG.amb -ps 4 -pt 0.1 -pj 0.9 -dj 0.5 -ds 0.25 -dt 0.25 -dc 0.5 -dr 1 -dp 256 -st 0.5 -ab 4 -ad 2048 -as 2048 -ar 64 -aa 0.200 -lr 6 -lw 0.010 -av 0 0 0 -e error.log 21MAR900_IMG.oct


rpict: 0 rays, 0.00% after 0.0000 hours
**************
*** PID  7196: rpict -t 10 -vth -vp 2.9926443099975586 2.4607629776000977 1.2908339500427246 -vd 0.22062274830588088 0.97407957431054992 0.049945829064473951 -vu 0.0 0.0 1.0 -vh 60.000 -vv 60.000 -vs -0.500 -vl 0.500 -x 550 -y 264 -af 21MAR900_IMG.amb -ps 4 -pt 0.1 -pj 0.9 -dj 0.5 -ds 0.25 -dt 0.25 -dc 0.5 -dr 1 -dp 256 -st 0.5 -ab 4 -ad 2048 -as 2048 -ar 64 -aa 0.200 -lr 6 -lw 0.010 -av 0 0 0 -e error.log 21MAR900_IMG.oct


rpict: 0 rays, 0.00% after 0.0000 hours
**************
*** PID  1288: rpict -t 10 -vth -vp 2.9926443099975586 2.4607629776000977 1.2908339500427246 -vd 0.22062274830588088 0.97407957431054992 0.049945829064473951 -vu 0.0 0.0 1.0 -vh 60.000 -vv 60.000 -vs 0.500 -vl 0.500 -x 550 -y 264 -af 21MAR900_IMG.amb -ps 4 -pt 0.1 -pj 0.9 -dj 0.5 -ds 0.25 -dt 0.25 -dc 0.5 -dr 1 -dp 256 -st 0.5 -ab 4 -ad 2048 -as 2048 -ar 64 -aa 0.200 -lr 6 -lw 0.010 -av 0 0 0 -e error.log 21MAR900_IMG.oct


rpict: 0 rays, 0.00% after 0.0000 hours
OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.

Geometry build time: 578 milliseconds for 612 objects.
OptiX kernel time: 281 milliseconds (1 seconds).
Adaptive sampling: 15 milliseconds.
Retrieved 54741 of 69696 potential seeds at level 0.
K-means performed 4 loop iterations in 391 milliseconds.
K-means produced 4091 of 4096 clusters at level 0.

OptiX kernel time: 328 milliseconds (0 seconds).
Retrieved 4040832 of 4194304 potential seeds at level 1.
rpict: internal - CUDA Error 4: unspecified launch failure
(D:/nljones/Radiance/src/rt/cuda_kmeans.cu:413)
rpict: 0 rays, 0.00% after 0.0019 hours

OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.

Geometry build time: 563 milliseconds for 612 objects.
rpict: internal - Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: driver().cuLaunchGridAsync(m_function, m_gridWidth, m_gridHeight, stream) returned (702): Launch timeout, [12124209])
(D:\nljones\Radiance\src\rt\optix_util.c:120)
rpict: 0 rays, 0.00% after 0.0017 hours

OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.

Geometry build time: 579 milliseconds for 612 objects.
OptiX kernel time: 1203 milliseconds (2 seconds).
Adaptive sampling: 31 milliseconds.
Retrieved 51036 of 69696 potential seeds at level 0.
K-means performed 5 loop iterations in 2594 milliseconds.
K-means produced 4089 of 4096 clusters at level 0.

OptiX kernel time: 547 milliseconds (1 seconds).
Retrieved 3861226 of 4194304 potential seeds at level 1.
rpict: internal - CUDA Error 4: unspecified launch failure
(D:/nljones/Radiance/src/rt/cuda_kmeans.cu:413)
rpict: 0 rays, 0.00% after 0.0028 hours

OptiX 3.9.1 found driver 9.1.0 and 1 GPU device:
Device 0: GeForce GT 740M with 2 multiprocessors, 1024 threads per block, 1032500 kHz, 2147483648 bytes global memory, 128 hardware textures, compute capability 3.5, timeout enabled, Tesla compute cluster driver disabled, cuda device 0.

Geometry build time: 563 milliseconds for 612 objects.
OptiX kernel time: 907 milliseconds (1 seconds).
Adaptive sampling: 2421 milliseconds.
Retrieved 54741 of 69696 potential seeds at level 0.
K-means performed 4 loop iterations in 485 milliseconds.
K-means produced 4087 of 4096 clusters at level 0.

OptiX kernel time: 531 milliseconds (0 seconds).
Retrieved 3997378 of 4194304 potential seeds at level 1.
rpict: internal - CUDA Error 4: unspecified launch failure
(D:/nljones/Radiance/src/rt/cuda_kmeans.cu:413)
rpict: 0 rays, 0.00% after 0.0033 hours

Nathaniel Jones

unread,
Jan 31, 2018, 6:10:29 PM1/31/18
to Accelerad Users
Hi Efi,

There are two types of error listed in your output. One is likely a TDR error (the one returning "(702): Launch timeout"). The other is an unspecified CUDA error. While it's hard to tell what that might be, one likely cause is that you seem to be trying to run multiple instances simultaneously. A single instance of Accelerad will use all available resources on the GPU, so Accelerad does not take kindly to multiple instantiation.

That said, because these errors did not show up in your previous error.log, they are not related to your previous issue. Mostapha may be correct about uncommented log info. I'm not sure how Honeybee processes Radiance's error stream, if at all.

Nathaniel

Mostapha Sadeghipour

unread,
Jan 31, 2018, 7:55:47 PM1/31/18
to Accelerad Users
It's just a guess as I can't test it mysefl but based on Nathaniel's email you should make sure that you set the number of CPUs to 1. Honeybee does breakdown each image to pieces to take advantage of parallel processing but in this case it can cause an issue.

Honeybee process the error.log file which is created by rpict -e flag. If the line does not start by `*` or `rpict` it will be considered as an error. It works fine with Radiance rpict conventions but it can be an issue here.

-- Mostapha

Nathaniel Jones

unread,
Jan 31, 2018, 8:27:02 PM1/31/18
to Accelerad Users
Hi Mostapha,

This may be a compatibility issue between Honeybee and Accelerad. In Radiance, errors and warnings are usually formatted `<program name>: <error level> - <error message>` and Accelerad follows this convention. I could add a flag to turn off the debugging messages that Accelerad prints to the error stream, but in general, if a line doesn't begin with `rpict` or `accelerad_rpict`, it's probably not an error.

Practically, this means Accelerad should run just fine, but Honeybee might have trouble processing the error file.

Nathaniel

Efi

unread,
Feb 1, 2018, 7:44:34 AM2/1/18
to Accelerad Users

Hi Mostapha and Nathaniel,

 

I changed the numOfCPUs from 4 to 1 but the problem with Cuda was the same when the quality was set to 2.

It was interesting the fact that when only the bounces were increased (quality set to 0), there was no Cuda error anymore. There was only the initial error (compatibility issue) as we discussed before. I could check each rad parameter separately just to see, maybe, what causes the problem and inform you about it.

Thank you so much for your help Mostapha and Nathaniel!

 

Kind Regards,

Efi

Mostapha Sadeghipour

unread,
Feb 1, 2018, 11:35:34 AM2/1/18
to Accelerad Users

I could add a flag to turn off the debugging messages that Accelerad prints to the error stream, but in general, if a line doesn't begin with `rpict` or `accelerad_rpict`, it's probably not an error.

Sounds good. If you can remove the debugging messages or add a * in front of the message we can take care of `accelerad_rpict` for parsing the log.

Efi, Please open an issue on Honeybee legacy repository and we will address it before the next release.

-- Mostapha

Nathaniel Jones

unread,
Feb 4, 2018, 11:28:19 PM2/4/18
to Accelerad Users
It turns out that there is already a feature in Accelerad that turns off printing of the extra debugging output. If you set the -w flag, they should not print, and I presume this will solve the compatibility issue. This will also disable all regular warning messages from Radiance and Accelerad.

In the next release, Accelerad will print the program name at the beginning of all non-empty lines to stderr.

Nathaniel
Reply all
Reply to author
Forward
0 new messages