quality string length is not equal to sequence length, fix your fastq file

4,112 views
Skip to first unread message

rsklav

unread,
Feb 21, 2018, 2:45:09 PM2/21/18
to rna-star
Hi all, 

As has been discussed before on this forum, I tried running STAR on my fastq files and ran into this error message. And it turned out I had the same issue as other people, namely my last read only had 90 quality characters, but that was only in one of the two fastq files; the other one was just fine. Instead of deleting that last read and then look for the corresponding read in the other fastq file to delete, I added 10 fake quality characters to the incomplete last read. I then ran into another issue: read1 and read2 are not consistent and STAR reaches the end of one file before the other. Can you help?

Thanks!


Alexander Dobin

unread,
Feb 23, 2018, 1:22:48 PM2/23/18
to rna-star
Hi @rsklav,

you need to make sure that the number of reads (lines) in two files is exactly the same. You can count the lines with
$ wc -l r1.fq r2.fq
and then cut the minimum of these numbers:
$head -n minN r1.fq > r1.fixed.fq
$head -n minN r2.fq > r2.fixed.fq

For FASTQ files minN should be divisible by 4 since you have 4 lines per read.

Cheers
Alex

Nasima

unread,
Apr 16, 2018, 2:56:54 PM4/16/18
to rna-star

Hi Alex,


I have the same problem. My data is human-herat failure PE sequences with four separate .fastq files for the first read and four .fastq files for th second read for each patient:


P1.1.R1.fastq.gz  

P1.2.R1.fastq.gz  

P1.3.R1.fastq.gz  

P1.4.R1.fastq.gz  

 

P1.1.R2.fastq.gz  

P1.2.R2.fastq.gz  

P1.3.R2.fastq.gz  

P1.4.R2.fastq.gz  

 

I’ve run the following script for all patients’ fastq files and it’s just working fine except for one patient:

 

STAR\

--runThreadN 32\

--genomeDir /HF2-STAR/genome\

--sjdbGTFfile /Homo_sapiens.GRCh38.79.gtf\

--readFilesIn P1.1.R1.fastq.gz, P1.2.R1.fastq.gz, P1.3.R1.fastq.gz, P1.4.R1.fastq.gz  P1.1.R2.fastq.gz, P1.2.R2.fastq.gz, P1.3.R2.fastq.gz, P1.4.R2.fastq.gz\

--readFilesCommand zcat\

--outFilterMultimapNmax 20\

--outSAMtype BAM SortedByCoordinate\

--quantMode TranscriptomeSAM GeneCounts\

--outFilterMatchNminOverLread 0\

--outFilterScoreMinOverLread 0\

--outFileNamePrefix "p1"\

 

For one patient I got the following error:

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length

@NB501373:8:HTTKYBGXX:4:22403:20084:1317

GCGATCTTTTTCAATACAATTTACACCCTCATCCCCATTTCCAGTCTGATTATACAAGTGCTAAGTGGCAGAA

@NB501373:8:HTTKYBGXX:4:13504:19963:7094 47732852 N 3

SOLUTION: fix your fastq file  

 


I tried to remove these lines from fastq files but then I got another error message:

 

EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one

SOLUTION: Check you your input files: they may be corrupted

 


I checked the number of reads in two files with $ wc -l r1.fq r2.fq and was the same (both were 56759618—not divisible by 4 though).

 

Right now I am stuck with this patient. Any help is much appreciated. 


Thank you,

Nasima

Alexander Dobin

unread,
Apr 16, 2018, 6:12:48 PM4/16/18
to rna-star
Hi Nasima,

it seems like that read is missing the quality scores. The total number of lines in FASTQ has to be divisible by 4.
Please post the output of
$ grep -B4 -A8 "^@NB501373:8:HTTKYBGXX:4:22403:20084:1317" r1.fq 
$ grep -B4 -A8 "^@NB501373:8:HTTKYBGXX:4:22403:20084:1317" r2.fq

Cheers
Alex

Samuel Mo

unread,
May 6, 2020, 10:05:33 AM5/6/20
to rna-star
Hi Alex,

Can I follow up on this thread even though it's been some time?

I'm also running into a similar error, and I've gotten other samples to work but a few like the original question. 

Can you explain why the fastq has to be divisible by 4 and what you're referring to regarding quality score? 

Alexander Dobin

unread,
May 9, 2020, 7:35:10 PM5/9/20
to rna-star
Hi Samuel,

each read is represented with 4 lines in the FASTQ files
@readID
sequence
+
quality scores
e.g.:
@D00102:CBR8BANXX171122:CBR8BANXX:1:1101:10003:18149 1:N:0:TAAGGCGACTCTCTAT
GCTCGTAGGAGCGTATCATCAGCTGGGTGCCGTTCTTCCTGTCTCTTATAC
+
/B<</BFFFFFFFFBBFFFFFFFFFFFFFFFFBFFFFFFFFFBBFFBFFFF

The length of the quality string should be equal to the sequence length.

Cheers
Alex


Reply all
Reply to author
Forward
0 new messages