pileup with RNA-seq

272 views
Skip to first unread message

tkd

unread,
Jun 27, 2011, 3:49:51 PM6/27/11
to Pysam User group
Hello,

Thanks for developing pysam. Very useful.

I'm having an issue using Samfile.pileup() to estimate coverage using
aligned RNA-seq data.

Using code like the following:

samfile = pysam.Samfile( "rnaseq.bam", "rb" )
pileupIter = samfile.pileup(chrName, start, end)
for pileup in pileupIter:
if pileup.pos>=start and pileup.pos<=end:
print chrName + " " + str(pileup.pos) + ":" + str(pileup.n)


The output agrees with samtools mpileup for some values of chrName/
start/end:

pysam:
chr3 50046995:7773
chr3 50046996:7795
chr3 50046997:7813
chr3 50046998:7794
chr3 50046999:7709
chr3 50047000:7672
chr3 50047001:7632
chr3 50047002:7598
chr3 50047003:7489
chr3 50047004:7384
chr3 50047005:7232

samtools:
chr3 50046996 7773
chr3 50046997 7795
chr3 50046998 7813
chr3 50046999 7794
chr3 50047000 7709
chr3 50047001 7672
chr3 50047002 7632
chr3 50047003 7598
chr3 50047004 7489
chr3 50047005 7384
chr3 50047006 7256


But disagrees for other values of chrName/start/end:

pysam:
chr3 50050799:2182
chr3 50050800:2182
chr3 50050801:2182
chr3 50050802:2182
chr3 50050803:2182
chr3 50050804:2182
chr3 50050805:2182
chr3 50050806:2182
chr3 50050807:2182
chr3 50050808:2182
chr3 50050809:2182

samtools:
chr3 50050800 5
chr3 50050801 5
chr3 50050802 5
chr3 50050803 5
chr3 50050804 5
chr3 50050805 5
chr3 50050806 5
chr3 50050807 5
chr3 50050808 5
chr3 50050809 5
chr3 50050810 5


In cases where pysam disagrees with samtools, visualizing the bam file
in the IGV validates the samtools results.

Are these discrepancies specific to RNA-seq data? There was another
post that mentioned that RNA-seq might behave strangely with pileup().

thanks,

Chris

Andreas Heger

unread,
Jun 28, 2011, 2:56:23 AM6/28/11
to pysam-us...@googlegroups.com
Hi Chris,

thanks, could you provide a small bam-file that would replicate
the problem. Please also send the pysam and samtools version and
I will have look.

Pileup in pysam is a bit shaky as I am no using it myself. There
is some read-filtering going in within samtools that I have hoped to
have emulated in pysam, but probably incompletely. It is also based on
some deprecated code within samtools and is really asking for a
re-write.

Best wishes,
Andreas

tkd

unread,
Jun 30, 2011, 5:51:03 PM6/30/11
to Pysam User group
Hi Andreas,

I sent the files you requested to your gmail account.

thanks,

Chris

Andreas Heger

unread,
Jul 3, 2011, 2:50:36 PM7/3/11
to pysam-us...@googlegroups.com
Hi Chris,

thanks for bringing this up. Samtools pileup replicates the old
and now deprecated samtools pileup functionality. I don't know
enough about mpileup to comment on the differences right now
(probably different read processing/filtering). Implementing
mpileup is on our todo list.

Best wishes,
Andreas

With pileup the output of samtools and pysam are
consistent:

samtools pileup -r chr3:50050800-50050810 test.bam | awk '$2 >= 50050799
&& $2 <= 50050809 { print $1"\t"$2"\t"$4 }'

chr3 50050799 2182
chr3 50050800 2182
chr3 50050801 2182
chr3 50050802 2182
chr3 50050803 2182
chr3 50050804 2182
chr3 50050805 2182
chr3 50050806 2182
chr3 50050807 2182
chr3 50050808 2182

chr3 50050809 218

tkd

unread,
Jul 6, 2011, 7:15:42 PM7/6/11
to Pysam User group
Hi Andreas,

OK. Thanks for checking that.

regards,

Chris
Reply all
Reply to author
Forward
0 new messages