Unusual negative likelihood ratios using bdgdiff

49 views
Skip to first unread message

Mariam

unread,
Sep 26, 2018, 6:23:01 PM9/26/18
to MACS announcement

Hi,


I am trying to use the bdgdiff subcommand to obtain regions with differential enrichment between two treatments following the instructions at: https://github.com/taoliu/MACS/wiki/Call-differential-binding-events. All the alignments, peak calling, etc. looks fine for these datasets. However, I get weird likelihood ratios for the bdgdiff output (see below for an example):



chr1 3451032 3451389 diff.T_vs_TV_common_1 -117554300858044311035353075854344192.00000

chr1 3451643 3452058 diff.T_vs_TV_common_2 -109741027444814162005763954525077504.00000

chr1 4774898 4776111 diff.T_vs_TV_common_3 -33142591245442814881061616199139328.00000

chr1 5185184 5185658 diff.T_vs_TV_common_4 -91960197554948816626121857505951744.00000

chr1 5969790 5970081 diff.T_vs_TV_common_5 -147659675541742645393245085120856064.00000


I can see some other people have encountered the same issue, but I was not able to find a solution to fix this. Has anyone been able to troubleshoot and fix this?


Thanks,

Mariam

Moshe Olshansky

unread,
Sep 30, 2018, 10:54:52 PM9/30/18
to MACS announcement
Hi Mariam,

As you have mentioned you are not the only one who encountered this problem... And it happens in some cases and does not happen in the others. It is definitely MACS bug. Since we have the code it is possible to trace the source of the error and correct it but I have never had time for this. I think that it happens more frequently when you keep more than one duplicate (see the --keep-duplicates option).

Good luck,
Moshe.

Mariam Okhovat

unread,
Oct 1, 2018, 11:20:20 AM10/1/18
to macs-ann...@googlegroups.com
Hi Moshe,

Thank you for your reply. I actually ended up running the same scripts a few more times and noticed that each time I got different likelihood ratios. So, I repeated the scripts until I got the "correct" ratios (the numbers made sense). 

Thanks,
Mariam

--
You received this message because you are subscribed to a topic in the Google Groups "MACS announcement" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/macs-announcement/l_LOpQ0kwnU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to macs-announcem...@googlegroups.com.
To post to this group, send email to macs-ann...@googlegroups.com.
Visit this group at https://groups.google.com/group/macs-announcement.
For more options, visit https://groups.google.com/d/optout.

Tao Liu

unread,
Oct 1, 2018, 1:45:37 PM10/1/18
to macs-ann...@googlegroups.com
Hi Moshe and Mariam,

It looks like an overflow issue. At a 32bit system (either your OS or Python is compiled 32bit), the range of a float value is -3.4E+38 to +3.4E+38. So on those systems, even though I have declared the score (log likelihood ratio) and many other variables to be a ‘double’ value in the source code, an overflow issue may occur. A simple solution is that, to run MACS2 on a modern Linux64 system and to make sure the Python installed is compiled in 64bit. Check it with the following command:

$python -c 'import sys,platform; print (platform.architecture()[0], sys.maxsize > 2**32)'

If it shows ‘(‘64bit', True)’ then your system should be able to deal with large value.

Best,
Tao
> You received this message because you are subscribed to the Google Groups "MACS announcement" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to macs-announcem...@googlegroups.com.

Mariam Okhovat

unread,
Oct 1, 2018, 5:23:02 PM10/1/18
to macs-ann...@googlegroups.com
Hi Tao, 

Both the systems I tried return ‘(‘64bit', True)’. Interestingly, the log likelihood ratios came out differently sometimes even when I ran the same script on different days/times (same exact settings!). Occasionally, the ratios were extremely small (negative) and other times they were huge (positive), but then often they came out looking normal and ranged between ~0-1 for the "common' file and were larger ~3-10 for the treatment specific files and I assumed that these were the true log likelihood ratios.

Thanks,
Mariam

Moshe Olshansky

unread,
Oct 1, 2018, 7:52:14 PM10/1/18
to MACS announcement
Hi Tao,

Floating point arithmetic has nothing to do with 32 bit or 64 bit architecture. This is for the length of memory addresses (and so on a 32 machine you can not (easily) use more than 4 GB of RAM).
The IEEE Standard for floating point (IEEE 754) was published in 1985 and stated that a single precision (float) number is 32 bits long with 8 bits for (binary) exponent while double precision number is 64 bits with 11 bits exponent. Even at that time when the computers were either 32 bit or 16 bit, Intel x87 math co-processor had a floating point register of 80 bits and IBM RS/6000 processor (announced around that time) had 64 bits long floating point registers.

Based on what Mariam is saying, there is some "randomness" in the result which most likely means memory leak.

Best regards,
Moshe.

On Thursday, September 27, 2018 at 8:23:01 AM UTC+10, Mariam wrote:

Tao Liu

unread,
Oct 2, 2018, 10:18:31 AM10/2/18
to macs-ann...@googlegroups.com
Hi Moshe and Mariam,

Thanks for your feedbacks!

If there is a memory leak problem, that would happen in the logLR_asym function in ScoreTrack.pyx. Let me check it out.

Best,
Tao Liu

Tao Liu

unread,
Oct 2, 2018, 2:11:30 PM10/2/18
to macs-ann...@googlegroups.com
It may due to the khash table I used to cache computed likelihood ratio values. Sorry that I can’t reproduce the error so can’t be 100% sure that’s the reason. On GitHub site, John Gaspar (jsh58) fixed an issue related to khash while calculating pvalue scores.

https://github.com/taoliu/MACS/pull/240/commits/ed80372298e64877c1fee0a3d38b220e9420ce94

My guess is that the same fix has to be applied to likelihood ratio calculation as well.

At the top of the following source code

https://github.com/taoliu/MACS/blob/ed80372298e64877c1fee0a3d38b220e9420ce94/MACS2/IO/ScoreTrack.pyx

The two logLR functions are defined as:

<code>

asym_logLR_khashtable = Int64HashTable()

cdef inline double logLR_asym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
chromatin bias ). Set minus sign for depletion.

*asymmetric version*
"""
cdef:
double s
long key_value

key_value = hash( (x, y ) )
try:
return asym_logLR_khashtable.get_item( key_value )
except KeyError:
if x > y:
s = (x*(log(x)-log(y))+y-x)*LOG10_E
elif x < y:
s = (x*(-log(x)+log(y))-y+x)*LOG10_E
else:
s = 0
asym_logLR_khashtable.set_item(key_value, s)
return s

sym_logLR_khashtable = Int64HashTable()

cdef inline double logLR_sym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
another enriched ). Set minus sign for H0>H1.

* symmetric version *
"""
cdef:
double s
long key_value

key_value = hash( (x, y ) )
try:
return sym_logLR_khashtable.get_item( key_value )
except KeyError:
if x > y:
s = (x*(log(x)-log(y))+y-x)*LOG10_E
elif y > x:
s = (y*(log(x)-log(y))+y-x)*LOG10_E
else:
s = 0
sym_logLR_khashtable.set_item(key_value, s)
return s

</code>

It should be replaced as the following, in order to avoid using khash:

<code>

asym_logLR_dict = dict()

cdef inline double logLR_asym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
chromatin bias ). Set minus sign for depletion.

*asymmetric version*
"""
cdef:
double s

if asym_logLR_dict.has_key( (x, y) ):
return asym_logLR_dict[ (x, y) ]
else:
if x > y:
s = (x*(log(x)-log(y))+y-x)*LOG10_E
elif x < y:
s = (x*(-log(x)+log(y))-y+x)*LOG10_E
else:
s = 0
asym_logLR_dict[(x,y)] = s
return s

sym_logLR_dict = dict()

cdef inline double logLR_sym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
another enriched ). Set minus sign for H0>H1.

* symmetric version *
"""
cdef:
double s

if sym_logLR_dict.has_key( (x, y) ):
return sym_logLR_dict[ (x, y) ]
else:
if x > y:
s = (x*(log(x)-log(y))+y-x)*LOG10_E
elif y > x:
s = (y*(log(x)-log(y))+y-x)*LOG10_E
else:
s = 0
sym_logLR_dict[ (x,y) ] = s
return s

</code>

If you can, please give it a try. You have to download MACS2 source code from GitHub, edit the source code and recompile and install MACS2. I will update MACS2 source code officially soon.

Best,
Tao

Moshe Olshansky

unread,
Oct 2, 2018, 8:58:19 PM10/2/18
to MACS announcement
Hi Tao,

Thank you.

I am very busy till Wednesday the 10th but after this I will try what you said and will report the results.


On Thursday, September 27, 2018 at 8:23:01 AM UTC+10, Mariam wrote:
Reply all
Reply to author
Forward
0 new messages