pb "undecided fastq version" with hiseq4000 data

19 views
Skip to first unread message

sabrina.ber...@gmail.com

unread,
May 30, 2020, 9:20:16 AM5/30/20
to LotuS rRNA pipeline
Hi,

I tried to run 6 samples generated by hiseq4000 but the following message appears during the reading of the fastq files "undecided fastq version" . Is it linked to the hiseq4000 format ? do I need to convert it ?

Thanks for your help

saby

Falk Hildebrand

unread,
Jun 1, 2020, 2:55:43 PM6/1/20
to LotuS rRNA pipeline
Dear Saby,
this is indeed a bit weird, and no normally LotuS/sdm can handle hiSeq4000 data that I have had so far. Is the run finishing? Do the numbers of filtered reads in the sdm logs (lotuslogs dir) look somewhat normal (i.e. normally I would expect that >70% of reads are remaining at least)?
best, Falk

sabrina.ber...@gmail.com

unread,
Jun 2, 2020, 10:22:29 AM6/2/20
to LotuS rRNA pipeline
Hi Falk,


We don't have any reads processed. We run it with sdm_hiSeq. I tested the FastqVersion parameters by putting 1 and then 2. A message appeared during the run "Unusually low sloexa quality score (-20); setting to 0."

Below, this is the head of the fastq file (we retrieved it on ENA archives, the number of the study sample is PRJNA480846 ; we intended to re-analyzed the data). I never get any problem with lOTUs run before by processing miSEq data or 454. It's the first time, I use hiSEq. Could you tell if there are any specific parameters to apply in the sdm_hiSeq file , please ? :

@SRR7515929.1 1/1
AATTATTGCNGTTAATTTCTAACCGGCCTTCGTNNCGNNNATCAGGATCAAATACGGTCATGATTAACGTTGCCTTCGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNAANNNCNTTNCNTCTGTCCATGGTGAAGCAGTGNTCAAATCCAT
+
A<FFFKKFF#KKKKA<7FKKKFKKKKK(<7FAK##7F###FA<,FAFKFA,AKKKKFFAA<AFFKAF,7FKFF7FFKKK############################A,###A#,7#A#FKKA77,,<FAFKK<7A,<F7#FK,,,<A,<,
@SRR7515929.2 2/1
CTAAATGTTCCAAAAAAGCTCTGATAACGGGTTGATTTACCCTGCTTTTCCCCTGGGGTGGGGTGTTTTTTTCAGATAGCTANGNNNNNNNNNNNNNNNNNNNNANAACTTTGTTTGTTGGGTGGATTACGGGCATGATCAGCCGTTGGTC
+
AA<AAAK<FAFKA<FA7,FFKKKKFKA,AFA,AFFFFKF,,77FFK7FK,AKFA7AFAAFFFFKFKFAFKKF7KKF<<FKKF#A####################,#FF,AFKK7A,A<AKKAKKKKFF77A7<,7,AAFKAFAFFKFKF7<

Does it look normal with hiSEq4000 ?


Thanks a lot

ALl the best,  Saby

Falk Hildebrand

unread,
Jun 2, 2020, 10:40:24 AM6/2/20
to LotuS rRNA pipeline
Hey,
mhm just looking at the wiki fast page, hiseq 4000 should be the ill 1.8+ protocol and your qual scores seem to be outside the range (K>Q41), and the interspersed "#" (=Q2) is also not very promising. Don't know maybe it's a new illumina protocol, but looks to me strange. E.g. all the reads that have "#" will be removed for low qual. Can you ask your sequence provider please about the fastq format?
hth,
Falk
  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ.....................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126
  0........................26...31.......40                                
                           -5....0........9.............................40 
                                 0........9.............................40 
                                    3.....9..............................41 
  0.2......................26...31........41                              

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 41)
     with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
     (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)
Reply all
Reply to author
Forward
0 new messages