mbendall
unread,Oct 12, 2011, 2:45:18 PM10/12/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gnumap-users
Hello-
Is there a way to tell GNUMAP not to convert FASTQ quality scores?
We are using data from Illumina 1.5+ pipeline, quality scores are
Phred (3,40) using ASCII 66-104. We convert the qseq.txt files to
Sanger-encoded FASTQ files for use in GNUMAP and other aligners.
We run the alignments through a few other QC steps (deduping, local
indel realignment) before variant calling.
We are having difficulty since the SAM files produced by GNUMAP change
the encoding of the original data. We are using different aligners,
then running the alignments through the same downstream analysis, so
it is important to maintain the original quality scores from the
data. Ideally, we would like to input Sanger encoded FASTQ and output
Sanger encoded SAM files.
Here is an example:
ORIGINAL Sanger encoded FASTQ:
IIIIIIIIIIIIIIIIIHIHIHIIIIIIIIEHFFIIHHIGHHCID@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
GNUMAP output (no flag):
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHFFHHHHHGHHCHD@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
GNUMAP output (using --illumina):
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHFFHHHHHGHHCHD@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
BWA output:
IIIIIIIIIIIIIIIIIHIHIHIIIIIIIIEHFFIIHHIGHHCID@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
BWA did not change the quality scores, so they are still in Sanger
encoding. GNUMAP is converting the quality scores (perhaps using
SOLEXA encoding or some internal recalibration?)
I also tried it with Illumina 1.5 encoded FASTQ. (The quality string
in the FASTQ is identical to the quality string in the qseq.txt file).
Illumina 1.5+ encoded FASTQ:
hhhhhhhhhhhhhhhhhghghghhhhhhhhdgeehhgghfggbhc_cc^cadgggfgffefffcfdfdbfffceb
GNUMAP output (using --illumina):
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHFFHHHHHGHHCHD@DD?
DBEHHHGHGGFGGGDGEGECGGGDFC
GNUMAP output (no flag, shown for comparison purposes):
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
BWA output:
hhhhhhhhhhhhhhhhhghghghhhhhhhhdgeehhgghfggbhc_cc^cadgggfgffefffcfdfdbfffceb
BWA quality is still Illumina encoded (BWA may have a flag that
outputs Sanger). GNUMAP converts the quality to Sanger encoding, but
may be doing so incorrectly if using Illumina 1.5+ encoding.
Thanks!
Matthew