Strange source artifact in 2016.4

147 views
Skip to first unread message

Nicholas Dana

unread,
Apr 26, 2016, 7:47:36 PM4/26/16
to mcx-users
Dr. Fang,

First off, thank you for releasing an update for MCXLAB. It seems like some very useful features have been implemented. I'm using a Windows 7 system, dual graphics cards (770 and 730), Matlab 2016a and have Cuda 7.0 toolkit installed, FYI.

I downloaded the MCXLAB binary and it has run without issue for me so far (assuming I've set all parameters appropriately). While I've not tested it extensively, I have found a persistent artifact when using 'gaussian' or 'disk' sources. Somewhere in the photon launch there are a few "hotspots" which appear to be periodic and radiate out from the center of the source. For what it's worth, Given the nature of their shape, I might guess they are the result of some convolution process, but I'm not very strong in C to look where that might be. 

The artifacts are subtle and can be missed with examining the data in log10 format, but they're quite evident when looking at the linear output, as you can see from the images I've attached. For reference, I included my code.

I have not yet recompiled MCXLAB on my own machine to see if the artifacts are reproducible with a new binary. Just wanted to let you know.

Thank you again for your and your team's hard work! 
SumOverZ.png
mcTestSourceArtifact.m
SliceImgLinear.png
SliceImgLog10.png
SliceThroughArtifact.png

Qianqian Fang

unread,
Apr 26, 2016, 8:56:10 PM4/26/16
to mcx-...@googlegroups.com
On 04/26/2016 07:47 PM, Nicholas Dana wrote:
Dr. Fang,

First off, thank you for releasing an update for MCXLAB. It seems like some very useful features have been implemented. I'm using a Windows 7 system, dual graphics cards (770 and 730), Matlab 2016a and have Cuda 7.0 toolkit installed, FYI.

I downloaded the MCXLAB binary and it has run without issue for me so far (assuming I've set all parameters appropriately). While I've not tested it extensively, I have found a persistent artifact when using 'gaussian' or 'disk' sources. Somewhere in the photon launch there are a few "hotspots" which appear to be periodic and radiate out from the center of the source. For what it's worth, Given the nature of their shape, I might guess they are the result of some convolution process, but I'm not very strong in C to look where that might be.

that does look worrisome, but I tried your script, I was not able to reproduce
this issue.

The screenshot for my output can be found in the attachment. I run it on
Ubuntu 14.04 with cuda 7.0 and matlab 2010b. I used a 980Ti for this
simulation.

Since you have two cards, Can you try a different GPU?

also, try to download a new binary package I uploaded a few hours
ago:

https://sourceforge.net/projects/mcx/files/mcxlab/2016.4%20%28Dark%20Matter%20beta%29/

I fixed a bug related to selecting multiple GPUs (cfg.gpuid).



The artifacts are subtle and can be missed with examining the data in log10 format, but they're quite evident when looking at the linear output, as you can see from the images I've attached. For reference, I included my code.

I have not yet recompiled MCXLAB on my own machine to see if the artifacts are reproducible with a new binary. Just wanted to let you know.


I notice you are still using cfg.respin, although you do have a
dedicated GPU. It will be much more efficient if you set cfg.respin to 1.


Thank you again for your and your team's hard work!

let me know what you find.

Qianqian

--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To post to this group, send email to mcx-...@googlegroups.com.
Visit this group at https://groups.google.com/group/mcx-users.
For more options, visit https://groups.google.com/d/optout.

src_profile.png

Nicholas Dana

unread,
Apr 26, 2016, 10:19:56 PM4/26/16
to mcx-users
Dr. Fang,

I ran my script on a different Windows 7 machine with a different video card (Quadro k600) and saw the same artifact in the source.

Additionally, I booted the original computer(with the two graphics cards) into the Ubuntu partition and ran the script using the GT 730 (the same card that generated the artifacts prior) and got the same results you posted above (i.e. no artifacts at all). 

In summary, this does not appear to be a result of the hardware, rather something particular to the Windows binary.

As to your comments regarding using respin with multiple GPUs. On Windows, even when I have two GPUs, I'll still hit the TdrDelay limit, though I'm using the 2nd GPU. From my experience, it seems Windows 7 applies the TdrDelay limit to both GPUs, regardless of whether or not one is rendering graphics. I've set my limit at 10 seconds, to make for fewer respins, but I've found no way to have a TdrDelay value for GPU 1 and not for GPU 2. Maybe someone else knows of a registry setting that I can tweak to fix this?

Thanks again, I'll let you know if I discover anything else.

Qianqian Fang

unread,
Apr 27, 2016, 12:27:27 AM4/27/16
to mcx-...@googlegroups.com
On 4/26/2016 10:19 PM, Nicholas Dana wrote:
Dr. Fang,

I ran my script on a different Windows 7 machine with a different video card (Quadro k600) and saw the same artifact in the source.

Additionally, I booted the original computer(with the two graphics cards) into the Ubuntu partition and ran the script using the GT 730 (the same card that generated the artifacts prior) and got the same results you posted above (i.e. no artifacts at all). 

In summary, this does not appear to be a result of the hardware, rather something particular to the Windows binary.

thanks for the test and feedback.

I just recompiled mcxlab using a different RNG. can you try the newly
uploaded *posix* windows mcxlab and see if the issue still presents?

https://sourceforge.net/projects/mcx/files/mcxlab/2016.4%20%28Dark%20Matter%20beta%29/?upload_just_completed=true



As to your comments regarding using respin with multiple GPUs. On Windows, even when I have two GPUs, I'll still hit the TdrDelay limit, though I'm using the 2nd GPU. From my experience, it seems Windows 7 applies the TdrDelay limit to both GPUs, regardless of whether or not one is rendering graphics. I've set my limit at 10 seconds, to make for fewer respins, but I've found no way to have a TdrDelay value for GPU 1 and not for GPU 2. Maybe someone else knows of a registry setting that I can tweak to fix this?

really? on Linux, only the card connected to a display is
subject to a time limit, non-connected cards are unrestricted.
what about the registry trick? can you get rid of TdrDelay at all?
Message has been deleted

Nicholas Dana

unread,
May 4, 2016, 5:23:02 PM5/4/16
to mcx-users
I posted a reply about a week ago, but it seems to have vanished.

I just recompiled mcxlab using a different RNG. can you try the newly
uploaded *posix* windows mcxlab and see if the issue still presents?

The posix compiled version cleared up the artifact in the source term and it also executed slightly faster with the same system/script.

really? on Linux, only the card connected to a display is 
subject to a time limit, non-connected cards are unrestricted.
what about the registry trick? can you get rid of TdrDelay at all?

I've not found a way to have a TdrValue apply to only one GPU on a Windows system. You can turn TdrDelay off by setting the value of TdrLevel, but this means that if your driver crashes, the system won't auto-recover. However, I found that by setting a TdrDelay value of 10 seconds and parceling up the respins to execute in 5-6 seconds you see only a roughly 10% decrease in photons/ms. That's what has worked best for me.

Jeremy Ford

unread,
Feb 23, 2018, 7:32:12 PM2/23/18
to mcx-users
Hello Dr. Fang,

I just started using mcxlab 2017.7 (Dark Matter RC2) on Windows 10 using a NVIDIA GTX 1060 with cuda 9.1 and MATLAB R2017a. I am running some tests and experiencing the same artifact that Nicholas described. I downloaded the posix version and tried running the script but received an error:

###############################################################################
#    The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365      #
###############################################################################
$Rev::       $ Last $Date::                       $ by $Author::              $
###############################################################################
- variant name: [Fermi] compiled for GPU Capability [100] with CUDA [7000]
- compiled with: RNG [POSIX erand48] with Seed Length [4]
- this version CAN save photons at the detectors


GPU=1 (GeForce GTX 1060) threadph=48 extra=16960 np=1000000 nthread=20480 maxgate=50 repetition=1
initializing streams ... MCXLAB ERROR -13 in unit mcx_core.cu:1185
Error: invalid device symbol

I have seen on the forum that others have had this error, but I have not changed GPU cards as in the linked situation. I appreciate any help that you can provide.

Thank you.

Jeremy
FluenceImage.jpg
FluenceLineProfile.jpg
testMCX.m

Qianqian Fang

unread,
Feb 23, 2018, 7:33:53 PM2/23/18
to mcx-...@googlegroups.com, Jeremy Ford
On 02/23/2018 06:51 PM, Jeremy Ford wrote:
Hello Dr. Fang,

I just started using mcxlab 2017.7 (Dark Matter RC2) on Windows 10 using a NVIDIA GTX 1060 with cuda 9.1 and MATLAB R2017a. I am running some tests and experiencing the same artifact that Nicholas described. I downloaded the posix version and tried running the script but received an error:

###############################################################################
#    The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365      #
###############################################################################
$Rev::       $ Last $Date::                       $ by $Author::              $
###############################################################################
- variant name: [Fermi] compiled for GPU Capability [100] with CUDA [7000]
- compiled with: RNG [POSIX erand48] with Seed Length [4]
- this version CAN save photons at the detectors


GPU=1 (GeForce GTX 1060) threadph=48 extra=16960 np=1000000 nthread=20480 maxgate=50 repetition=1
initializing streams ... MCXLAB ERROR -13 in unit mcx_core.cu:1185
Error: invalid device symbol

I have seen on the forum that others have had this error, but I have not changed GPU cards as in the linked situation. I appreciate any help that you can provide.

hi Jeremy

this issue was fixed in the nightly build version, please
download it here:

http://mcx.space/nightly/win64/

Qianqian

Jeremy Ford

unread,
Feb 28, 2018, 3:15:56 PM2/28/18
to Qianqian Fang, mcx-...@googlegroups.com
Hi Dr. Fang,

I downloaded and tested the nightly build version, both the code in the MCXSuite and in the mcxlab nightly build, and I continued to see this artifact.

One thing I am trying to do to get around this is test mcxlab on a linux virtual machine or linux partition since these artifacts did not seem to occur in the compilation for linux. Optimally, I am trying to get this to work on Windows 10, though.

Thank you for the help.

Best,
Jeremy

----------------------
Ph.D. Candidate
Outreach Chair | Biomedical Engineering Graduate Student Association
Department of Biomedical Engineering
Vanderbilt University

To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+unsubscribe@googlegroups.com.

Qianqian Fang

unread,
Feb 28, 2018, 6:32:49 PM2/28/18
to Jeremy Ford, mcx-users
On 02/28/2018 03:15 PM, Jeremy Ford wrote:
Hi Dr. Fang,

I downloaded and tested the nightly build version, both the code in the MCXSuite and in the mcxlab nightly build, and I continued to see this artifact.

One thing I am trying to do to get around this is test mcxlab on a linux virtual machine or linux partition since these artifacts did not seem to occur in the compilation for linux. Optimally, I am trying to get this to work on Windows 10, though.

hi Jeremy

I apologize. I thought the previous issue was "invalid device symbol".
after reading the thread, I realized it was about photon propagation in mcx.

I confirm that I was able to reproduce this issue on windows 10 with
CUDA 8 using the latest mcx code. I can also confirm that this does not
show on Linux based binaries.

with a little bit more investigation, I found the issue is more extensive
than just the disk source, this also seem to happen on Gaussian
source, or planar-like sources (4 equal-distant points along the edge,
see attached figure). yes, this only happens on windows, and is
only observable with low scattering.

just by looking at this figure, I think it suggests some random number
generator issue. It looks like one of the two random numbers determining
the x/y position of the photon tends to drop to 0 for some reason.
and there is a period associated with this drop, likely some sort of
self-correlation.

I will investigate this further. The current RNG used is a new one
(xorshift128+) compared to the one when this was initially reported.
so, I am not entirely sure why this issue persists. Because xorshift128+
also involves 64bit data structures to store RNG states, so, there might
also be a possibility of the Windows nvidia driver has some sort of bug
related to 64bit data processing.

I am wondering if you can create a tracker on github, upload your
testing script and results. I will use that tracker to update the progress.

https://github.com/fangq/mcx/issues

thanks again for reporting this.

Qianqian
rng_defect_windows.png

Qianqian Fang

unread,
Mar 11, 2018, 5:52:04 PM3/11/18
to Jeremy Ford, mcx-users, Nicholas Dana
hi Jeremy

just to let you know that the bug reported by you (and initially by Nicholas Dana
back in 2016) was finally fixed. Please see the bug tracker Issue#31 on github.

https://github.com/fangq/mcx/issues/31

This bug impacts all simulations using 2016.4 and newer releases of the
MCX/MCXLAB on Windows only, where the default RNG was changed from
logistic-lattice to xorshift128+ in 2016. This issue is especially prominent
in low-photon number simulations.

The cause of the bug was due to the low precision of the Windows host
random number generator (RNG) - it can only create 15 random bits. This
makes the first hundreds of GPU random number sequences containing
0 values, until these 0-bits replaced after many steps. The pattern you
have observed was a result of these zeros values, resulting from the
incorrectly initialized RNG states.

A fix is posted and the windows mcx/mcxlab are updated. see

https://github.com/fangq/mcx/commit/a0d445bb43768547ee8c00043917cd38ad1f68da

Please download the nightly build version and give it a try.

http://mcx.space/nightly/win64/

Qianqian


On 02/28/2018 06:32 PM, Qianqian Fang wrote:

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing
Feedback

Qianqian Fang

unread,
Mar 11, 2018, 6:21:43 PM3/11/18
to Jeremy Ford, mcx-users, Nicholas Dana
On 03/11/2018 05:51 PM, Qianqian Fang wrote:

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing
Feedback
hi Jeremy

just to let you know that the bug reported by you (and initially by Nicholas Dana
back in 2016) was finally fixed. Please see the bug tracker Issue#31 on github.

https://github.com/fangq/mcx/issues/31

This bug impacts all simulations using 2016.4 and newer releases of the
MCX/MCXLAB on Windows only, where the default RNG was changed from
logistic-lattice to xorshift128+ in 2016. This issue is especially prominent
in low-photon number simulations.


look back to the bug trackers on github, I realized that this bug actually
impacts all previous releases of mcx on Windows!

there are total 4 bug reports related to this issue:

https://github.com/fangq/mcx/issues/17
https://github.com/fangq/mcx/issues/26
https://github.com/fangq/mcx/issues/29
https://github.com/fangq/mcx/issues/31

it is my understanding that all 4 issues are rooted from the same issue.

the artifacts in bug#17 and bug#26 (hollow center and a radial line)
seem to be the manifestation of this bug in the old logistic-lattice RNG;
while the artifacts in bug#29 and bug#31 (4 equal-distance dots) are
the manifestation of this issue in the xorshift128+ RNG which became
default since April 2016.

in any case, I urge everyone who runs mcx/mcxlab on Windows to
upgrade your software to the nightly-build version at

http://mcx.space/nightly/win64/

I apologize for taking so long for me to connect the dots and eventually
provide a fix to this issue.

Qianqian

Jeremy Ford

unread,
Mar 14, 2018, 9:55:28 AM3/14/18
to mcx-users
Hello Dr. Fang,

Thank you for looking into this and fixing it.  I have been playing around with the nightly build on Windows 10 and it seems to be running just fine now. I appreciate you addressing this issue so quickly. Thank you.

Best,
Jeremy
To unsubscribe from this group and stop receiving emails from it, send an email to ...@googlegroups.com.
To post to this group, send email to ...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To post to this group, send email to ...@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To post to this group, send email to ...@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To post to this group, send email to ...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages