On 24 Apr 2014, at 22:20, Aaron Quinlan <
aaronq...@gmail.com> wrote:
> On Apr 24, 2014, at 2:22 PM,
ebet...@gmail.com wrote:
>> I was wondering if anyone could clarify whether samtools view -s maintains read pairs intact?
Samtools's view -s option selects reads according to scores based on (a hash calculated from) their read names. Hence the in or out decision comes out the same way for all reads with the same name, so read pairs are indeed kept intact.
[Aaron wrote:]
> To my knowledge, it does not:
>
> samtools view -b -s 0.01 NA18152.bam \
> | samtools view - \
> | cut -f 1 \
> | sort \
> | uniq -c \
> | head -10
I suspect you've found a bunch of singleton reads in your NA18152.bam, though I don't know why the alphabetically-first reads would all be singletons.
Using a similar test to yours but without the head -10,
samtools view -s 0.001 foo.bam | cut -f 1 | sort | uniq -c | cut -c1-7 | sort -n | uniq -c
I get numbers like
880 1
737008 2
which is a similar proportion of singleton reads as in the original foo.bam.
John
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.