bigWigAverageOverBed min > max

95 views
Skip to first unread message

Hamutal Arbel

unread,
Dec 22, 2015, 4:57:02 PM12/22/15
to gen...@soe.ucsc.edu
Hello,

I am trying to compute the maximal chip score (BDTNP data downloaded from UCSC genome browser) of transcription factors along certain intervals. Using bigWigAverageOverBed to compute it with the -minMax option I get some very strange results:

chr2L   26400   29300   chr2L256        1.40337 1.1121  1.70216

chr2L   88900   92500   chr2L881        0       1.7e+308        -1.7e+308

chr2L   155200  155500  chr2L1544       0       1.7e+308        -1.7e+308

chr2L   160600  162500  chr2L1598       1.20521 1.16347 1.24457

chr2L   165500  167200  chr2L1647       0       1.7e+308        -1.7e+308

As you can see, though the program works fine with some lines, I get the computational equivalents of infinity and -infinity in others. stranger still, the maxima and minima in these lines are reversed. I suspect this is in regions where there is no signal (no TF binding at all). Is this the usual behavior of the code for these cases? is this intentional? 

Thank you,
Hamutal Arbel

Brian Lee

unread,
Jan 4, 2016, 4:13:36 PM1/4/16
to Hamutal Arbel, gen...@soe.ucsc.edu
Dear Hamutal,

Thank you for using the UCSC Genome Browser and your question about Berkeley Drosophila Transcription Network Project (BDTNP) data and using the utility bigWigAverageOverBed.

In future correspondence please provide more specifics regarding the file you are investigating. Based on the information you provided, I believe you may be looking at the following track data,  http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=dm3&g=bdtnpChipper, with related files located in these directories:

It looks as though you are correct about this unusual behavior with bigWigAverageOverBed, where when there is no information in a region (indicated by the zero in the fifth column) that the result for the maximum and minimum calculations go to positive and negative infinity as you describe: chr2L 165500 167200 chr2L1647 0 1.7e+308 -1.7e+308

We have created a bug ticket to address this issue, but at this time it is not clear when the change will be released. 

You may be interested to use the hgWiggle utility to complete these queries as well.  Here are some resources:
 
If you create a .hg.conf file you should be able to run commands such as the following that will query the public MySQL server (in this case querying the dm3 bdtnpCad1Fdr1 table for "BDTNP ChIP/chip: caudal (cad) antibody 1, stage 4-5 embryos, False Discovery Rate (FDR) 1%"):

<pre>
$hgWiggle -db=dm3 -doStats -position=chr2L:165500-167200 bdtnpCad1Fdr1
# chrom specified: chr2L
# position specified: 165500-167200
# stats: no data points found

$hgWiggle -db=dm3 -doStats -position=chr2L:160600-162500 bdtnpCad1Fdr1
# Database: dm3, Table: bdtnpCad1Fdr1
# Chrom Data Data # Data Data Bases Minimum Maximum Range Mean Variance Standard
#  start end values span covered deviation
chr2L 160613 162480 51 1 51 2.11204 3.77278 1.66074 2.89194 0.240442 0.490349
</pre>

As you can see hgWiggle will rather return a message about no data points found. In order for hgWiggle to work the .wib file must be in place (http://hgdownload.soe.ucsc.edu/gbdb/dm3/bdtnp/bdtnpCad1Fdr1.wib for the above example). You may wish to look through our mailing list to learn more about how to use the hgWiggle command: https://groups.google.com/a/soe.ucsc.edu/forum/#!msg/genome/t-maWef8wKk/sNAWIsKR1wcJ

You may also be interested in bwtools from Andy Pohl that can manage any kind of intersection with bigWig files:

Thank you again for bringing this issue with bigWigAverageOverBed to our attention and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

--


Reply all
Reply to author
Forward
0 new messages