question about bedgraph to bigwig conversion

1,148 views
Skip to first unread message

Terumi Kohwi-Shigematsu

unread,
Oct 16, 2020, 3:33:04 PM10/16/20
to gen...@soe.ucsc.edu, Terumi Kohwi-Shigematsu, Kohwi-Shigematsu, Terumi
Dear UCSC genome browser team,
All this time, we have been successfully viewing your ucsc genome tracks for our ChIPseq data.  For some unknown reason, using the same PC computer, we ran into a problem  with bash on the bedgraph to bigwig conversion step.   It has never happened before. 
It would be appreciated if you could help me out about this issue.

I use Bowtie2 --> Macs2 --> bdg2bw ChIPfile_treat_pileup.bdg hg19-chromInfo.txt
However, an error says, "overlapping regions in bedGraph line 76017978 of ChIPfile.bdg.sort.clip" and genome browser says "it is not a bigwig file" even the converted file looks bigwig.

Then, I used the commands below.
LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s in.bdg > out.bdg.tmp
bedtools merge -i out.bdg.tmp -c 4 -d 0 -o max > out.bdg
LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s out.bdg > out.bdg.sorted
bedGraphToBigWig out.bdg.sorted hg19-chromInfo.txt out.bigwig

The last command did not work for our files and an error says, "End coordinate 16581 bigger than chrM size of 16571 line 23 of ChIPfile.bedg.sorted".
So, I actually used first two commands and the original conversion line "bdg2bw ChIPfile_treat_pileup.bdg hg19-chromInfo.txt"
Also, I tried "max", "min" and "mean" on the 2nd line.
--> comes out:
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr3%3A18389133%2D18480265&hgsid=913426339_C4A4z77VsxAdVVJ81FAZt1J65IPN

MERGE-mean: latest sample using "mean"
OLD-GOOD: the old sample using our original commands without errors <-- we want this shape
OLD-merge-mean: the same old sample using new commands with "min"
MERGE-MAX: latest sample using "max"

We have no idea why the command failed this time. Just to make sure, we also tried other dataset stored in our server and none of them works. This is not due to using a new computer. We remained deeply puzzled. I wonder whether you can help us resolve this, and if you need anything from us, we can send them to you.

Thank you so much for your time and effort checking our problem.
Sincerely,
Terumi Kohwi-Shigematsu
Professor
UCSF
San Francisco, CA
tel mobile 510-410-2833

Matthew Speir

unread,
Oct 20, 2020, 4:27:06 PM10/20/20
to Terumi Kohwi-Shigematsu, UCSC Genome Browser Discussion List, Kohwi-Shigematsu, Terumi
Hello, Terumi.

Thank you for your question about using bedGraphToBigWig.

The first error you note ("overlapping regions in bedGraph line 76017978 of ChIPfile.bdg.sort.clip") is likely due to the fact that MACS2 very rarely produces overlapping peaks. One of our engineers shares that he has used this awk code in the past from https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/62 to resolve the issue:

cat bedgraph.bed | awk 'BEGIN{OFS="\t"}{if (NR>1 && prev_chr==$1 && prev_chr_e<=$2) {print $0}; prev_chr=$1; prev_chr_e=$3;}

The second error ("End coordinate 16581 bigger than chrM size of 16571 line 23 of ChIPfile.bedg.sorted") is again likely a quirk of macs2 where it will occasionally produce elements that extend past the end of the chromosome. There are to ways you should be able to resolve the issue:
  1. Use the wigToBigWig utility with the "-clip" option to clip items that extend past the end of a chromosome to the chromosome boundary. You can run wigToBigWig on bedGraph files. 
  2. Run bedClip on these files before passing them to bedGraphToBigWig. By default, bedClip will drop these items that extend past the chromosome boundaries, but with the "-truncate" option it will do the same thing as the "-clip" option for wigToBigWig. 
Both of these utilities can be obtained from our download server: http://hgdownload.soe.ucsc.edu/admin/exe/

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CANvTo-xPGoPFYoe0QHQFF-pk%3DW9aZw8X3bU0vN%2BHyeg_i3JGOQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages