The difference of pileupcolumn.nsegments, pileupcolumn.get_num_aligned() and samtools depth

509 views
Skip to first unread message

Tianran Yao

unread,
Oct 15, 2018, 12:48:59 AM10/15/18
to Pysam User group
When I tried to deal with the query sequences ( pileupcolumn.get_query_sequences() ) by iterating each position through a bam file.
I found that the result from pileupcolumn.nsegments,   pileupcolumn.get_num_aligned()  and samtools depth are not exactly the same.

For samtools, I used ' samtools depth S1_region.bam '
For pysam, I used
bamfile_af = pysam.AlignmentFile('S1_region.bam', 'rb')
for pileupcolumn in bamfile_af.pileup():
   
print(pileupcolumn.reference_pos, pileupcolumn.nsegments, pileupcolumn.get_num_aligned() )

I found the results from three commands are different.
See result.xls
              
When I changed bamfile_af.pileup() to bamfile_af.pileup(min_base_quality=0 , ignore_overlaps= False), the results from
pileupcolumn.nsegments, pileupcolumn.get_num_aligned() were equal (although they still differed a little from samtools depth).
But another problem happened.
The following code raised an AssertionError
bamfile_af = pysam.AlignmentFile('S1_region.bam', 'rb')
for pileupcolumn in bamfile_af.pileup(min_base_quality=0 , ignore_overlaps= False):
   
print(pileupcolumn.reference_pos, pileupcolumn.nsegments, pileupcolumn.get_num_aligned() )
    seq_lst = pileupcolumn.get_query_sequences() # this will raise an exception


File "/usr/local/lib/python3.5/dist-packages/pysam/libcalignedsegment.cpython-35m-x86_64-linux-gnu.so", line 3041, in pysam.libcalignedsegment.PileupColumn.get_query_sequences

builtins.AssertionError:


Can anyone explain this ?
S1_region.bam
S1_region.bam.bai
result.xls
Message has been deleted

ELZED LIEW

unread,
Oct 17, 2018, 9:53:48 AM10/17/18
to Pysam User group
I ran into the same issue.
But have not been able to fixed it.

One thing that you might need to check is whether the MD tag exists in your sam/bam file.
But i have no other clue other than this because my bam is produced by minimap2 and no MD can be produced by it.

Tianran Yao

unread,
Oct 18, 2018, 9:45:22 PM10/18/18
to pysam-us...@googlegroups.com

I think I figured it out.


1, firstly as I said nsegments and get_num_aligned() are not equal due to the "ignore_overlaps" option. However I don't why “ignore_overlaps” matters that much since pysam manual doesn't mention anything about this.


2, “
samtools depth” will give you the actual depth at the current position, but the number that nsegments and get_num_aligned() give you is how many segments “cover that position”. It means even if there's a deletion or a gap in a segment as long as its reference_start and reference_end cover that position, the segment still “cover that position”. For example for the area in red box, the
nsegments and get_num_aligned() will return 20 while samtools depth will report 10


fig.jpg


3, I still don't know why the exception raised but to workaround it, I changed the codes to:


for pileupcolumn in bamfile_af.pileup

    seq_lst = []

    for pileupread in pileupcolumn.pileups:

        seq_lst.append( pileupread.alignment.query_sequence[pileupread.query_position] ) 








--
You received this message because you are subscribed to the Google Groups "Pysam User group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-gro...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tianran Yao

unread,
Oct 18, 2018, 9:54:05 PM10/18/18
to pysam-us...@googlegroups.com


for pileupcolumn in bamfile_af.pileup

    seq_lst = []

    for pileupread in pileupcolumn.pileups:

        if not pileupread.is_del and not pileupread.is_refskip:

            seq_lst.append( pileupread.alignment.query_sequence[pileupread.query_position] ) 

ELZED LIEW

unread,
Oct 23, 2018, 3:51:48 AM10/23/18
to Pysam User group
Haha,
good job. it's always good to check alignments in IGV.
Reply all
Reply to author
Forward
0 new messages