Creating a fastq directly from an AlignedSegment

413 views
Skip to first unread message

Jane Hawkey

unread,
Jul 8, 2015, 1:08:03 AM7/8/15
to pysam-us...@googlegroups.com
Hi,

I was wondering if there is a way to take an AlignedSegment from the AlignmentFile and instead of writing that out in SAM/BAM format, convert it into the fastq class and write it out directly as a fastq instead?

Jane Hawkey

unread,
Jul 8, 2015, 1:35:28 AM7/8/15
to pysam-us...@googlegroups.com
The reason I ask is because I am currently having trouble writing out a SAM/BAM file with the appropriate header. Even after setting either template, header or both in the AlignmentFile command, no header is added to my output.

eg:
    samfile = pysam.AlignmentFile(sam_file, "r")
    left_clipped_reads = pysam.AlignmentFile(left_clipped_bam, "wb", template=samfile)
    right_clipped_reads = pysam.AlignmentFile(right_clipped_bam, "wb", template=samfile)

Jane Hawkey

unread,
Jul 8, 2015, 1:36:12 AM7/8/15
to pysam-us...@googlegroups.com
Version of pysam is 0.8.3

Florian Finkernagel

unread,
Jul 8, 2015, 2:39:41 AM7/8/15
to pysam-us...@googlegroups.com
Creating a fastq is as easy as 
for read in samfile.fetch():
  op.write(">%s\n%s\n+\n%s\n"  % (read.name, read.query_sequence, "".join([chr(ord(x) + 33)) for x in read.query_qualities)

How are you checking for the header? I was under the impression that BAM files always have a header, samtools doesn't show it if you don't ask for it via -h or -H though.

So long
Florian

Jane Hawkey

unread,
Jul 8, 2015, 6:13:48 AM7/8/15
to pysam-us...@googlegroups.com
Thanks for that, the sticking point was trying to work out how to convert the quality scores into the appropriate format.

I'm fairly certain there is no header because samtools complains when I try and use the resulting bam file in downstream analysis (gives me a no header found error). To try and check what was happening, I wrote the file out as a sam file instead and when I opened it the header was missing.

Florian Finkernagel

unread,
Jul 9, 2015, 3:19:00 AM7/9/15
to pysam-us...@googlegroups.com
Indeed, for a SAM to have a header you need to open it with 'wh'. The 'h' is unnecessary for BAM files (and will lead to an exception if passed).

There must be something else going on, since BAM files always have a header.
Perhaps your downstream tool is assuming a SAM file, tries to read the BAM file, and fails, since obviously there is no valid SAM header in there?
Reply all
Reply to author
Forward
0 new messages