A more efficient way to do pileup?

52 views

Skip to first unread message

Yilei Huang

unread,

Nov 30, 2021, 12:56:55 PM11/30/21

to Pysam User group

Hello,

I need to have a pileup for some pre-defined regions for a chromosome. It would be similar to

samtools mpileup ---positions [bed file] in.bam

where the bed file gives a list of pre-defined regions. I want to know how many reads aligned to a position are of A/C/G/T each.

My current code is

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------sf = pysam.AlignmentFile(path2bam, "rb")

    for i, p in enumerate(pos):
        for pileupcolumn in sf.pileup(ch, p-4, p+5, min_mapping_quality=minMapQual, min_base_quality=minBaseQual, truncate=True):
            basePos = pileupcolumn.pos
            for pileupread in pileupcolumn.pileups:
                query_pos = pileupread.query_position
                if not pileupread.is_del and not pileupread.is_refskip:
                    baseCall = pileupread.alignment.query_sequence[query_pos]
# the rest of the code is omitted here
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

However I found this is pretty slow (a lot slower than samtools's mpileup command line tool). I wonder if there could be a more efficient way to do what I want.Thanks! 

Reply all

Reply to author

Forward

0 new messages