Hi all,
I am having the same issue with fastq input file formating as is outlined above. I am working with STAR v. 2.4.2a. The error I get is as follows:
"ERROR_00201: EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >"
Some background:
1. My reads are all pulled down from the SRA and converted into fastq.
2. I've tried running STAR with both gzipped and bzipped input, but all efforts reported below were with unzipped fastq files.
The raw SRA reads look like this. You can see that there is no read/quality-score wrapping, and the headers are standard for SRA fastq (based on Wikipedia).
@SRR805129.1 HWI-ST485:135522712:C1T0BACXX:1:1101:1210:2112 length=100
CAGCTGCAGACCCAGATGATGGAAGAAGGGGAAAAGGTGAAGGAAAAACTCAAGAGGGAGCTGGAGCAGCTGCAGGCAGATGTTGCTCCCTTTCTGGTGG
+SRR805129.1 HWI-ST485:135522712:C1T0BACXX:1:1101:1210:2112 length=100
@BCDFFFFHFHHHJIDIJEHIJGGHAGHIJJFGGIJJHHGJIJIIJIGIHIIJJJJJJIJHHHHFFFFEEEEACDBDDDDDDDCDCDCDDDDDCDCCC?B
@SRR805129.2 HWI-ST485:135522712:C1T0BACXX:1:1101:1190:2137 length=100
CTCCAATGGCGACCCAAAATGCCAAAGTGATTCCAGTAGCAAGACCACCAAAGGCACCTTTCCAATTGGCACAAGGAAAAATAATTCACAGTGTAAATAC
+SRR805129.2 HWI-ST485:135522712:C1T0BACXX:1:1101:1190:2137 length=100
???D=D>3B?C:FG1C;<;?GEEHD>>?@4??990:09BBB>>;)?88;B(6;C9=D()=EHEECED;;).;;AC=(;>=955@################
@SRR805129.3 HWI-ST485:135522712:C1T0BACXX:1:1101:1106:2179 length=100
TCACTCAGAATTGAGTTTTTGTTATGGTTTGATTAAGTGTGTATCCTGTAAATAATGGGAATCAGTGTGTTAGTCCCCCTATGATGGCAAAGACGGCCCC
+SRR805129.3 HWI-ST485:135522712:C1T0BACXX:1:1101:1106:2179 length=100
@@@DDFDFBFFHHHHFHIJJIHJAHHIIIIH>GHEEHGHIGHGIGJIGGFHGICGGGIIIIJFHGIIGIHJIJJDGHIFHECDEFCDEDACC>=@B88?@
I've quality trimmed the reads using Trimmomatic, which results in the same header format. Some reads are completely excluded and some trimmed to a shorter length. The length variable in the headers is no longer accurate in some instances, but given the error above this does not seem to be the issue. These quality-trimmed reads produced my initial instance of the above error. I've since investigated whether characters in the headers produced these issues and removed any spaces, equal signs, and periods and replaced them with underscores or dashes. Still receiving the same errors. The last set of reads I tried looked like this:
@SRR805129-1_HWI-ST485:135522712:C1T0BACXX:1:1101:1210:2112_length-100
CAGCTGCAGACCCAGATGATGGAAGAAGGGGAAAAGGTGAAGGAAAAACTCAAGAGGGAGCTGGAGCAGCTGCAGGCAGATGTTGCTCCCTTTCTGGTGG
+SRR805129-1_HWI-ST485:135522712:C1T0BACXX:1:1101:1210:2112_length-100
@BCDFFFFHFHHHJIDIJEHIJGGHAGHIJJFGGIJJHHGJIJIIJIGIHIIJJJJJJIJHHHHFFFFEEEEACDBDDDDDDDCDCDCDDDDDCDCCC?B
@SRR805129-3_HWI-ST485:135522712:C1T0BACXX:1:1101:1106:2179_length-100
TCACTCAGAATTGAGTTTTTGTTATGGTTTGATTAAGTGTGTATCCTGTAAATAATGGGAATCAGTGTGTTAGTCCCCCTATGATGGCAAAGACGGCCCC
+SRR805129-3_HWI-ST485:135522712:C1T0BACXX:1:1101:1106:2179_length-100
@@@DDFDFBFFHHHHFHIJJIHJAHHIIIIH>GHEEHGHIGHGIGJIGGFHGICGGGIIIIJFHGIIGIHJIJJDGHIFHECDEFCDEDACC>=@B88?@
I'm pretty stumped, so if anyone sees something I am missing, please let me know. I appreciate any help people can provide.
Thanks,
Daren