mergeBed Error

3,201 views
Skip to first unread message

siddha...@gmail.com

unread,
Jun 9, 2014, 1:57:33 PM6/9/14
to bedtools...@googlegroups.com
Hi!

To analyze my ChIP-Seq data, I need a bed file with coordinates for all ucsc genes with every gene listed just once i.e. without splice variants listed separately. So, I am using mergeBed to merge overlapping entries in the "gene annotation" bed file downloaded from ucsc to generate such a file. I am getting following error with my command:

$mergeBed -i all_ucsc > genes_ucsc_merged &

$ Error: Sorted input specified, but the file all_ucsc has the following out of order record

chr1 85177038 85257589


I don't know what is wrong with the file. Can you please help me with that?

Thanks,

Sid

Aaron Quinlan

unread,
Jun 9, 2014, 2:00:29 PM6/9/14
to bedtools...@googlegroups.com
Hi Sid,

The merge tool requires that your input is sorted in chromosome order. In other words, you need to sort your all_ucsc file as follows before using it with merge:

    sort -k1,1 -k2,2n all_ucsc > all_ucsc.sorted



--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

burkha...@gmail.com

unread,
Jul 10, 2014, 5:00:13 PM7/10/14
to bedtools...@googlegroups.com
Hello! I'm having the same / similar problem, but I know that my reads are sorted. I've tried both samtools sort on my .bam files and piccard SortSam to no avail.

My files are different alignments of the same reads to the same reference genome. The files are indexed (samtools index). 

I try:

$: bedtools jaccard -abam bowtie2Alignment/1.rtx700.bowtie2.sorted.bam -b novoAlign/1.rtx700.novo.sorted.bam
Error: Sorted input specified, but the file novoAlign/1.rtx700.novo.sorted.bam has the following out of order record
GL005859.1 98 165 SRR998967.sra.40003860/2 70 +

This is the first read in the bam file I'm using. No matter which pair of bam files I compare, I get an out of order error for the file in b. Is this because it's expecting the file in a differently sorted order? Right now it's by coordinate with contigs first then chromosomes last.


Thanks
Dan

Aaron Quinlan

unread,
Jul 11, 2014, 11:44:52 AM7/11/14
to bedtools...@googlegroups.com
Hi Dan, 

By default, bedtools assumes the records are sorted lexicographically (e.g., chr1, chr10, etc.) by chromosome when using the -sorted option.  I suspect that SortSam has arranged the records more "numerically" (e.g., chr1, chr2, etc.).  If so, you need to provide the jaccard tool with a "genome file" (-g option) defining the expected order. An example is here: 


Now, in writing this, I realized that the help menu for the jaccard tool does not specify that -g is an option.  I am sorry about that, we will fix it for the next release.

Best,

Daniel Burkhardt

unread,
Jul 12, 2014, 10:48:07 AM7/12/14
to bedtools...@googlegroups.com
Thank you for this tip Aaron. I'm now having a slightly different problem. I've sorted files and retrieved the index from the sorted bam files using samtools and picard.

samtools (or picard) sort alginment.bam > sorted.alignment.bam
samtools idxstats sorted.alignment.bam | cut -f1-2 > sorted.alignment.idx

This index file looks like: 
GL005897.1      1005
GL005896.1      1013
GL005895.1      1022
GL005894.1      1039
GL005893.1      1061
GL005892.1      1066
... all the way down to ...
GL002604.1      8818317
8       55460251
9       59635592
10      60981646
6       62208784
5       62352331
7       64342021
4       68034345
1       73840631
3       74441160
2       77932606
*       0

I'm a little surprised these are sorting by contig length instead of by contig/chromosome, but it shouldn't matter so long as I have the index file, right?

But for some reason I'm still getting this error when I run bedtools merge with the samtools-sorted version:
Error: Sorted input specified, but the file 1.rtx700.bowtie2.sorted.bam has the following out of order record
GL005880.1      13      103     SRR999013.sra.1052443/2 6       -

And this one with the picard-sorted version:
Error: Sorted input specified, but the file 1.rtx700.bowtie2.picard.bam has the following out of order record
GL005887.1 658 748 SRR998967.sra.67822698/2 42 +

I have several different alignments, but they're all getting stopped at different places. None of which are the first read.

All of the index files are in the same order, and seem to be identical when I use

diff alignment1.idx alignment2.idx

Have you ever seen this happen? Should I make a new post about this?

Thanks,
Dan


You received this message because you are subscribed to a topic in the Google Groups "bedtools-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bedtools-discuss/3-yX0D2CRhI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bedtools-discu...@googlegroups.com.

Aaron Quinlan

unread,
Jul 17, 2014, 10:48:31 AM7/17/14
to bedtools...@googlegroups.com
Hi Dan,

Could you send me a (private) email with a minimal files that produce the error as well as the command you are using?  I think I will need to look closely into this.

diegop...@gmail.com

unread,
May 14, 2015, 11:23:50 AM5/14/15
to bedtools...@googlegroups.com
I had this issue too with bedtools intersect. Using -nobuf solved this for me. I guess is something to do with buffers (obvious) impairing the sort.

From docs:
"-nobuf - Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time."
http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discuss+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to a topic in the Google Groups "bedtools-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bedtools-discuss/3-yX0D2CRhI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bedtools-discuss+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

wuxiaoj...@gmail.com

unread,
Mar 19, 2017, 10:31:54 AM3/19/17
to bedtools-discuss, siddha...@gmail.com
The merge tool requires that your input is sorted in chromosome order. In other words, you need to sort your all_ucsc file as follows before using it with merge:

    sort -k1,1 -k2,2n all_ucsc > all_ucsc.sorted
you need to sort all files. it will work !

在 2014年6月10日星期二 UTC+8上午1:57:33,siddha...@gmail.com写道:
Reply all
Reply to author
Forward
0 new messages