split_libraries_fastq error: Failed qual conversion

490 views
Skip to first unread message

Jocelyn Sietsma Penington

unread,
Aug 3, 2016, 9:48:27 PM8/3/16
to qiime...@googlegroups.com
Hi,
I have a split_libraries_fastq.py command that worked when I ran it last year:
  split_libraries_fastq.py -i reads.fastq -b barcodes.fastq -m $DATADIR/mapping${run}.txt \
  --barcode_type 16  -p 0.90 -q 29 -o labelled_hiqual

I want to re-run it with --store_demultiplexed_fastq  so that I can compare the output qualities with another run,
but this time I get the error

  File "/usr/local/bioinfsoftware/python/current/bin/split_libraries_fastq.py", line 391, in <module>

    main()

  File "/usr/local/bioinfsoftware/python/current/bin/split_libraries_fastq.py", line 370, in main

    for fasta_header, sequence, quality, seq_id in seq_generator:

  File "/usr/local/bioinfsoftware/python/current/lib/python2.7/site-packages/qiime-1.9.1.dev0-py2.7.egg/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file

    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):

  File "/usr/local/bioinfsoftware/python/current/lib/python2.7/site-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq

    seqid)

skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: M01176:175:000000000-AJRBC:1:1101:15763:1979 1:N:0:1. This may be because you passed an incorrect value for phred_offset.


That seq id is the first in the files. 

I get the same error message with and without --store_demultiplexed_fastq; with --phred_offset 33 and with --phred_offset 64 ; with and without -q 20


Looking at the code that generated the message, line 174 of fastq.py, the line that is triggering the error is

if enforce_qual_range and ((qual < 0).any() or (qual > 62).any())

which is concerning because the output of FastQC on reads.fastq shows quality scores up to 76


Edit: I have attached the output of qiime_config -t , and the 1st read and barcode, which are enough to show the error


Thanks for your time,


Jocelyn

qiime_config_test4Aug.txt
barcode1.fastq
read1.fastq

Jocelyn Sietsma Penington

unread,
Aug 4, 2016, 1:16:01 AM8/4/16
to Qiime 1 Forum
I see one part of the error chain is 'process_fastq_single_end_read_file'
I think this might be part of the problem - the data is assembled from paired-end reads before I give it to split_libraries_fastq, and PEAR seems to have summed the quality scores (The original single-end qualities are <= 40). Is this new in qiime 1.9.1?

justink

unread,
Aug 4, 2016, 2:08:12 AM8/4/16
to Qiime 1 Forum
First, thanks for the detailed message.

That sounds like it might be a bug. Does anyone know why we enforce a qual range in skbio? I don't think we're running out of ascii chars for phred+33 ...

And afaik, there's no maximum phred score—it's just adding 9's to 99.9999.... %

In the meantime, perhaps comment out that section of code and see if it works? Alternatively, we could probably go through your input fastq file and change any quality score >62 to exactly 62. That's basically a "we're sure of this nucleotide" anyway, and besides we don't use quality scores in much of the downstream analyses anyway (could you imagine doing PCoA that took into account sequence quality? My brain hurts.)

Jai Ram Rideout

unread,
Aug 4, 2016, 1:05:21 PM8/4/16
to Qiime 1 Forum
I don't think this is a bug. Technically there is no upper bound to Phred scores, but with the FASTQ format, there is an imposed Phred score range due to how Phred scores are encoded in the file format. See this section of the scikit-bio FASTQ docs for details.

That being said, I don't understand why/how the quality scores are being decoded outside the range [0, 62]. Jocelyn, are you running the command on the same exact data you analyzed a year ago, or have you preprocessed your data differently this time around? What version of QIIME did you use then, and what version are you using now? Can you attach a FASTQ file with the first few reads?

Thanks,
Jai

Jocelyn Sietsma Penington

unread,
Aug 5, 2016, 1:23:34 AM8/5/16
to Qiime 1 Forum
Hi Jai,

the input files are the same files now as last year. 
Last year I used qiime 1.8.0, this time I used 1.9.1
I attached previously the first read and barcode as 2 FASTQ files.

The raw reads have Phred-scores <= 41, but the output of the PEAR program to assemble paired-end reads has Phred-scores up to a max of about 80, and this is the input I used. I presume where 2 calls agree the scores are summed?
Since posting I ran successfully with qiime 1.8.0, so I have the data I needed for comparison of qualities now.

Jocelyn

Colin Brislawn

unread,
Aug 5, 2016, 2:21:46 PM8/5/16
to Qiime 1 Forum
Hello Jocelyn,

I presume where 2 calls agree the scores are summed?
That's a great question!
 
This paper describes how PEAR combines q scores: http://bioinformatics.oxfordjournals.org/content/30/5/614.full
This paper describes how q scores should be combined: http://bioinformatics.oxfordjournals.org/content/31/21/3476.full.html 

Colin

PS Let me know if you can view these papers, or if I should send you the PDFs.

Jai Ram Rideout

unread,
Aug 8, 2016, 3:13:46 PM8/8/16
to Qiime 1 Forum
Hi Jocelyn,

Thanks for the details! I think I know what's going on:

QIIME 1.8.0's FASTQ reader did not check the range of decoded quality scores. QIIME 1.9.1's FASTQ reader (implemented in scikit-bio) checks that quality scores fall in the range [0, 62]. The FASTQ reader in 1.9.1 assumes it is decoding Illumina 1.3 (phred offset 64) or Illumina 1.8 (phred offset 33), which both have decoded quality scores in range [0, 62]. QIIME 1.9.1 is using an old version of scikit-bio's FASTQ reader; newer versions support other types of encoding schemes and arbitrary Phred offsets.

I'm not familiar with PEAR, and after a quick search through their docs and paper I wasn't able to find details on how PEAR encodes output quality scores. My guess is that they are using a Phred offset of 33, with quality scores not constrained to the Illumina range of [0, 62]. For now, I think your workaround (using QIIME 1.8.0 with Phred offset 33) is the best way to go. This shouldn't be an issue in QIIME 2.

Best,
Jai
Reply all
Reply to author
Forward
0 new messages