Converting FASTQ quality scores

385 views
Skip to first unread message

mbendall

unread,
Oct 12, 2011, 2:45:18 PM10/12/11
to gnumap-users
Hello-

Is there a way to tell GNUMAP not to convert FASTQ quality scores?

We are using data from Illumina 1.5+ pipeline, quality scores are
Phred (3,40) using ASCII 66-104. We convert the qseq.txt files to
Sanger-encoded FASTQ files for use in GNUMAP and other aligners.
We run the alignments through a few other QC steps (deduping, local
indel realignment) before variant calling.

We are having difficulty since the SAM files produced by GNUMAP change
the encoding of the original data. We are using different aligners,
then running the alignments through the same downstream analysis, so
it is important to maintain the original quality scores from the
data. Ideally, we would like to input Sanger encoded FASTQ and output
Sanger encoded SAM files.

Here is an example:

ORIGINAL Sanger encoded FASTQ:
IIIIIIIIIIIIIIIIIHIHIHIIIIIIIIEHFFIIHHIGHHCID@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
GNUMAP output (no flag):
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHFFHHHHHGHHCHD@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
GNUMAP output (using --illumina):
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHFFHHHHHGHHCHD@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
BWA output:
IIIIIIIIIIIIIIIIIHIHIHIIIIIIIIEHFFIIHHIGHHCID@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC

BWA did not change the quality scores, so they are still in Sanger
encoding. GNUMAP is converting the quality scores (perhaps using
SOLEXA encoding or some internal recalibration?)

I also tried it with Illumina 1.5 encoded FASTQ. (The quality string
in the FASTQ is identical to the quality string in the qseq.txt file).

Illumina 1.5+ encoded FASTQ:
hhhhhhhhhhhhhhhhhghghghhhhhhhhdgeehhgghfggbhc_cc^cadgggfgffefffcfdfdbfffceb
GNUMAP output (using --illumina):
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHFFHHHHHGHHCHD@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
GNUMAP output (no flag, shown for comparison purposes):
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
BWA output:
hhhhhhhhhhhhhhhhhghghghhhhhhhhdgeehhgghfggbhc_cc^cadgggfgffefffcfdfdbfffceb

BWA quality is still Illumina encoded (BWA may have a flag that
outputs Sanger). GNUMAP converts the quality to Sanger encoding, but
may be doing so incorrectly if using Illumina 1.5+ encoding.

Thanks!

Matthew

Nathan Clement

unread,
Oct 29, 2011, 8:57:42 PM10/29/11
to gnumap...@googlegroups.com
Matt,

I think I've fixed both of those problems. If the input comes as fastq, it will print out that same fastq sequence in the sam outfile. I also fixed a bug with the --illumina flag. It appears that, since I implemented this, there have been quite a few changes.

Let me know if there are any further issues. The new version can be obtained from http://dna.cs.byu.edu/gnumap/gnumap-3.0.2.tgz

Nathan

Matthew Bendall

unread,
Nov 2, 2011, 11:42:51 AM11/2/11
to gnumap...@googlegroups.com
Nathan-

Just ran a few analyses, everything looks great.  Thanks!

Matthew
Reply all
Reply to author
Forward
0 new messages