A more efficient way to do pileup?

51 views
Skip to first unread message

Yilei Huang

unread,
Nov 30, 2021, 12:56:55 PM11/30/21
to Pysam User group
Hello,
I need to have a pileup for some pre-defined regions for a chromosome. It would be similar to
samtools mpileup ---positions [bed file] in.bam
where the bed file gives a list of pre-defined regions. I want to know how many reads aligned to a position are of A/C/G/T each. 

My current code is
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------sf = pysam.AlignmentFile(path2bam, "rb")
for i, p in enumerate(pos):
for pileupcolumn in sf.pileup(ch, p-4, p+5, min_mapping_quality=minMapQual, min_base_quality=minBaseQual, truncate=True):
basePos = pileupcolumn.pos
for pileupread in pileupcolumn.pileups:
query_pos = pileupread.query_position
if not pileupread.is_del and not pileupread.is_refskip:
baseCall = pileupread.alignment.query_sequence[query_pos]
# the rest of the code is omitted here
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

However I found this is pretty slow (a lot slower than samtools's mpileup command line tool). I wonder if there could be a more efficient way to do what I want.Thanks!
Reply all
Reply to author
Forward
0 new messages