" python ~/software/DANPOS3/danpos.py dpos S8_MN_merge.bam:S24_MN_merge.bam -m 1 --mifrsz 140 --mafrsz 180 -s 1 -a 1 -t 1e-5 -p 1e-5 "
It raises the following error:
"position level integrative analysis for H_MN_merge_sort_position:G_MN_merge ...
FDR simulation...
Traceback (most recent call last):
File "/home/huangyingzhang/software/DANPOS3/danpos.py", line 1365, in <module>
if sys.argv[1]=='dpos':runDANPOS(command='dpos')
File "/home/huangyingzhang/software/DANPOS3/danpos.py", line 424, in runDANPOS
pcfer=0)#args.pcfer)#0)
File "/home/huangyingzhang/software/DANPOS3/functions.py", line 501, in danpos
fuzFDRlist=fuzFDR(cwig=pooledgroups[groupnames[1]],twig=pooledgroups[groupnames[0]],simu=fdrsimu,rd=rd,regions=peaks)
File "/home/huangyingzhang/software/DANPOS3/functions.py", line 1295, in fuzFDR
tempv=log10fuztest(pc=p,pt=p,cr=cr,cwig=cwig,twig=twig,rd=rd)
File "/home/huangyingzhang/software/DANPOS3/functions.py", line 1343, in log10fuztest
unnumpy(ct),unnumpy(cc), log_bool=True)
File "/home/huangyingzhang/software/DANPOS3/functions.py", line 66, in pf
answer = log(answer)
ValueError: math domain error "
But when I use " python ~/software/DANPOS3/danpos.py dpos S8_MN_merge.bam -m 1 --mifrsz 140 --mafrsz 180 -s 1 -a 1 -t 1e-5 -p 1e-5 or python ~/software/DANPOS3/danpos.py dpos S24_MN_merge.bam -m 1 --mifrsz 140 --mafrsz 180 -s 1 -a 1 -t 1e-5 -p 1e-5 " , everything is fine.
Can anyone help me to fix this problem? Thanks a lot.
Yingzhang
The problem is due to a error where the program tries to calculate log(0). The 0 occurs due to a floating point error in pvalue calculation (=number smaller than python can handle, therefore it uses 0 instead. Happens below 2.2250738585072014e-308, so for log10pvalues < -308.)
The fix from above stops DANPOS from crashing and will have 0 in the result table for fuzziness log10pvalues which in reality are the ones with really really low pvalues. Unfortunately positions with a pvalue of 1 (absolutely not significant) would have a log10pvalue of log10(1) = 0 as well... I don't know how often that happens.
The following fix will result in "-Inf" instead of 0, which is better:
in functions.py change:
# equivalent of pf function in R
# q - vector of quantiles
# df1, df2 - degrees of freedom
# lower_tail - logical; if TRUE probabilities are
# P[x<=x], otherwise P[X>x]
# log_bool - if true probabilities p are given as log(p)
def pf(q,df1, df2, lower_tail=True, log_bool=False):
answer = f.cdf(q, dfn=df1,dfd=df2)
if not lower_tail:
answer=1-answer
if log_bool:
answer = log(answer)
return(answer)
to
# equivalent of pf function in R
# q - vector of quantiles
# df1, df2 - degrees of freedom
# lower_tail - logical; if TRUE probabilities are
# P[x<=x], otherwise P[X>x]
# log_bool - if true probabilities p are given as log(p)
def pf(q,df1, df2):
answer = f.logcdf(q, dfn=df1,dfd=df2)/log(10) #divide by log(10) to get log10Pval
return(answer)
Hello,
I am posting here a solution to the math domain error related to pf and ppois functions in functions.py.
I got a math domain error for pf function while running dtriple in DANPOS3. The solutions mentioned in this thread previously for this error did not work for a particular dataset I am analysing. Thanks to Python experts that I talked to about these issues, I was able to run dtriple with the solution described here. This solution can be used for ppois function for similar error.
When "answer" is computed as very close to zero in either ppois or pf function, it throws an error because the log value does not exist. We don't know exactly how these 2 functions are being used in dtriple and how they relate to the overall functionality of DANPOS. However, to work around the error, a fix is to set the "answer" value to the logarithm of the smallest number that Python3 supports (1e-323) - this answer value is -743.75, which is rounded up to -744 in the code snippet for functions.py below.
Another option is to retain answer as the raw value rather than the logarithm (in which case the value will be 0) - however, given that the Danpos programmer's intent was to provide an option to obtain the raw value (in which case no error is thrown) or its logarithm, returning -744 gets closer to the original intent, hence this suggestion. The print commands mentioned here are useful for understanding how many errors occur but are not necessary to get the program to run.
The two modified functions from functions.py are shown below. The edited portion is shown in blue color.
Change 1: ppois function
# equivalent of ppois function in R
# q - vector of quantiles
# mean - (AKA lambda) vector of (non-negative) means
# lower_tail - logical; if TRUE probabilities are
# P[x<=x], otherwise P[X>x]
# log_bool - if TRUE, probabilies p are given as log(p)
def ppois(q, mean, lower_tail=True, log_bool = False):
answer=poisson.cdf(q,mean)
if not lower_tail:
answer=1-answer
if log_bool:
try:
answer = log(answer)
except ValueError:
print("Error1") #Can be removed. Gives an idea as to how many samples fit this pattern.
answer = -744
return(answer)
Change 2: pf function
# equivalent of pf function in R
# q - vector of quantiles
# df1, df2 - degrees of freedom
# lower_tail - logical; if TRUE probabilities are
# P[x<=x], otherwise P[X>x]
# log_bool - if true probabilities p are given as log(p)
def pf(q,df1, df2, lower_tail=True, log_bool=False):
answer = f.cdf(q, dfn=df1,dfd=df2)
if not lower_tail:
answer=1-answer
if log_bool:
try:
answer = log(answer)
except ValueError:
print("Error2") #Can be removed. Gives an idea as to how many samples fit this pattern
answer = -744
return(answer)
Change 2 alone in functions.py, corresponding to change in pf function, worked for the dataset I am analysing, and I got the expected output. I did get NaN warning and Error2 in the pf function for a few points, but the job was completed successfully.
To further diagnose the impact of the edit on the program, another option is to also print the values for answer (Error is seen when answer=0) , <q, mean> for the ppois function and <q, df1, df2> for the pf function.
Best,
Yashoda