Re: [bedtools-discuss] bamToBed ->.bedpe : Make the read be the 1st read?

486 views
Skip to first unread message

Aaron Quinlan

unread,
Apr 22, 2013, 11:41:58 AM4/22/13
to bedtools...@googlegroups.com
Hi Mikel,

The BEDPE output from bamtobed always places the "leftmost" end (i.e., the end with the lowest start coordinate first).  When sequencing paired-end fragments with Illumina the leftmost end is on the forward strand for properly paired fragments.  This explains the pattern you see - it is intentional.

I gather you'd like the output to be reported such that the first _end_ sequenced is reported in the output?  If so, I could add such a feature. 


On Apr 18, 2013, at 2:59 PM, mikel.za...@gmail.com wrote:

Hi

bamToBed -bedpe outputs only one alignment per pair, the one where bam insert size is >0. For concordant pairs, I take that to mean that it will output only the alignment that is upstream of its pair, in other words, all pairs will have the orientation of the first read as +, second one -. The output seems to confirm this (for the most part; puzzlingly there seem to be a minuscule fraction where the alignments are reported as - +)

Am I understanding this right? If so, is there a way to force bamToBed to report the alignment of the first read (or the second read), instead of the one in the + strand? That way it would be trivial to obtain the bed of all sequenced fragments when the strandedness matters.

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Aaron Quinlan

unread,
Apr 28, 2013, 9:16:09 PM4/28/13
to bedtools...@googlegroups.com
Hi Abdullah and Mikel,

I think there is a bit of confusion here.  The BEDPE output, in fact, does not always show the first block to be on the first strand.  It just turns out that because (I assume) you are working with paired-end Illumina data, the vast majority of pairs have the first block on the + strand because bamtobed -bedpe places the end with the lowest start coordinate first in the output.  The tool indeed inspects the FLAG to determine the strand for each end.

It would be possible to override this default behavior by having an option that would instead always report the end that was first in sequencing placed first in the output.

Cheers,

On Apr 23, 2013, at 3:54 AM, abdulla...@gmail.com wrote:

Dear Aaron,

I agree with Mikel. I have the same situation with my data. 
All BEDPE outputs show the first strand to be on + strand. However, we would like to use this output to associate reads to genes. Then, if we can not know the strand of the (first_end_ sequenced) read. It would be difficult to know to which gene (the one on the + strand or the one on the - strand) this read should be associated to.
I do suggest that you investigate the proper strand from the flag value. If the read on the positive strand has flag value (99), then the strand of the pair is +, however, if its flag value is (163), it should be -, as per this link (http://ppotato.wordpress.com/2010/08/25/samtool-bitwise-flag-paired-reads/).

Do you agree on this solution? if no, please let me know what is the best solution?
Moreover, when do you think this feature will be available?

Regards,
Abdullah

anton...@gmail.com

unread,
Aug 12, 2013, 6:24:50 AM8/12/13
to bedtools...@googlegroups.com


On Monday, April 29, 2013 10:16:09 AM UTC+9, Aaron Quinlan wrote:

It would be possible to override this default behavior by having an option that would instead always report the end that was first in sequencing placed first in the output.


+1 for this.

I am also working with a paired-end protocol on Illumina where this would make perfect sense. Right now I always get the + - orientation and it is a very convoluted process to retroactively figure out the true orientation of the original transcript.

Anton

Aaron Quinlan

unread,
Aug 13, 2013, 9:09:07 PM8/13/13
to bedtools...@googlegroups.com
Hi Anton, Abdullah and Mikel,

I have just pushed changes to the Github repository that include a new option "-mate1" that, when used with the "-bedpe" option, always reports the first mate as the first block in the BEDPE record.  Consider the example below as a demonstration.

$ samtools view -X HG00739.chr20.qrysrt.bam | cut -f 1-9 | less
SRR069529.2276  pPR2    22      37690886        37      35M66S  =       37691221        369
SRR069529.2276  pPr1    22      37691221        37      66S35M  =       37690886        -369
SRR069529.2371  pPR2    22      43639559        37      29M72S  =       43639915        399
SRR069529.2371  pPr1    22      43639915        37      57S44M  =       43639559        -399
SRR069529.2406  pPR2    22      35442032        60      35M66S  =       35442359        367
SRR069529.2406  pPr1    22      35442359        60      60S41M  =       35442032        -367
SRR069529.2994  pPR1    22      23321645        60      46M55S  =       23321999        388
SRR069529.2994  pPr2    22      23321999        60      66S35M  =       23321645        -388

# NOTE: without using -mate, the first block is always the "leftmost" in the genome
$ bedtools bamtobed -i testingData/HG00739.chr20.qrysrt.bam -bedpe  | head -4
22 37690885 37690920 22 37691220 37691255 SRR069529.2276 37 + -
22 43639558 43639587 22 43639914 43639958 SRR069529.2371 37 + -
22 35442031 35442066 22 35442358 35442399 SRR069529.2406 60 + -
22 23321644 23321690 22 23321998 23322033 SRR069529.2994 60 + -

# NOTE: yet while using -mate, the first block is always the first to have been sequenced.
$ bedtools bamtobed -i testingData/HG00739.chr20.qrysrt.bam -bedpe -mate1 | head -4
22 37691220 37691255 22 37690885 37690920 SRR069529.2276 37 - +
22 43639914 43639958 22 43639558 43639587 SRR069529.2371 37 - +
22 35442358 35442399 22 35442031 35442066 SRR069529.2406 60 - +
22 23321644 23321690 22 23321998 23322033 SRR069529.2994 60 + -

Thanks for the reminded and for your patience.


Reply all
Reply to author
Forward
0 new messages