bam header missing from "intersect" output?

518 views
Skip to first unread message

till....@gmail.com

unread,
Jul 20, 2015, 9:42:08 AM7/20/15
to bedtools...@googlegroups.com
Hi all,

I'm tyring to get mapped reads that overlap 100% with a feature. I have the features in a bed file and use the following command to get he overlapping reads with bedtools:

intersectBed -header -g genome_chr.txt -wb -f 1.0 -sorted -a features.bed -b mapped_reads.bam > overlapping_reads.bam

This seems to run fine, however when I try to open the bam file with "samtools view" it complains of a missing header. I tried getting the SAM header from the original mapped_reads.bam file and use samtools reheader, but I also get an error message ("[W::bam_hdr_read] EOF marker is absent. The input is probably truncated. [E::bam_hdr_read] invalid BAM binary header").

I tried with and without the -header option in bedtools, but the result is the same. The input mapped_reads.bam file has a header and works with samtools as expected. Using -bed to output bed instead of bam works, too.
Is this a bug or am I missing something?

Cheers,

Till

Aaron Quinlan

unread,
Jul 20, 2015, 9:51:37 AM7/20/15
to bedtools...@googlegroups.com
Hi Till,

Your bam file would need to be used as the "-a" file not the "-b" file.

Best,
- Aaron
--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

till....@gmail.com

unread,
Jul 20, 2015, 10:39:24 AM7/20/15
to bedtools...@googlegroups.com
Hi Aaron,

thanks for your answer, but I'm not sure I understand. So that bedtools produces a bam file without header is expected behavior in this case, and I have to switch the files to get one with header?
I don't think that would work in my case, because I use -f 1.0 to ensure that every base of the feature is contained in each read in the output, and that's a fraction of file "-a".

Cheers,

Till

Aaron Quinlan

unread,
Jul 20, 2015, 11:18:20 AM7/20/15
to bedtools...@googlegroups.com
Hi Till,

Sorry, I am also confused about what you are trying to do. Do you want a BAM file consisting solely of the reads that encapsulate one or more of the features in your BED file? Or are you trying to create a BAM file of the features?

- Aaron
(Phone. Tyops. Brevity. Sorry.)

till....@gmail.com

unread,
Jul 20, 2015, 11:40:01 AM7/20/15
to bedtools...@googlegroups.com
The features in my bed file are microsatellites, and I want to get their length in the samples that were sequenced (as they differ from the reference). For that I need all the reads from the bam file that overlap the complete microsatellite.
So your first assumption is correct, I want a bam file of all reads (as they are in the original bam file) which overlap the full length of my microsatellite features.

Cheers,

Till

Aaron Quinlan

unread,
Aug 5, 2015, 1:00:12 PM8/5/15
to bedtools...@googlegroups.com
Hi Till,

Sorry for the delay. What you are trying to do isn't easily possible in bedtools, so I just added two new options to the intersect tool that will increase the flexibility of the tool and address the analysis you are working on.

As a complement to the existing -f and -r options, the two new options (-F and -e in blue) are:

-f Minimum overlap required as a fraction of A.
- Default is 1E-9 (i.e., 1bp).
- FLOAT (e.g. 0.50)

-F Minimum overlap required as a fraction of B.
- Default is 1E-9 (i.e., 1bp).
- FLOAT (e.g. 0.50)

-r Require that the fraction overlap be reciprocal for A AND B.
- In other words, if -f is 0.90 and -r is used, this requires
 that B overlap 90% of A and A _also_ overlaps 90% of B.

-e Require that the minimum fraction be satisfied for A OR B.
- In other words, if -e is used with -f 0.90 and -F 0.10 this requires
 that either 90% of A is covered OR 10% of  B is covered.
 Without -e, both fractions would have to be satisfied.

Using these options, the relevant command for your analysis would be:

bedtools intersect -a mapped_reads.bam -b features.bed -g genome_chr.txt -F 1.0 -sorted  > overlapping_reads.bam
 
In other words, we are intersecting the reads against your features and only reporting those reads that entirely overlap the features (via the new -F).

This functionality is only in the Github respoitory (a new release will be available soon), so you will have to clone the repository and compile the source code from scratch to have these options.

I hope this helps.

Aaron

Reply all
Reply to author
Forward
0 new messages