Hello Qiime community,
I need to upload some community 16S and ITS sequencing data to the SRA, which requires demultiplexed FASTQ files for submission. My data came from an Illumina MiSeq using a paired-end protocol, and the yeast and bacteria raw data are contained in R1 and R2 FASTQ files. I first joined paired ends, demultiplexed the joined sequences file using split libraries, then ran the script make_fastq.py to generate a FASTQ file for each library (sample) in my dataset. However, I received errors when I tried to upload these files. I talked to an SRA curator who noticed that the FASTQ headers aren't formatted correctly.
My FASTQ sequence headers look like this:
@81_1000000399 read_id=M02034:33:000000000-A7RA6:1:1101:13173:1946 barcode=CGGCATAT
But SRA needs them to look like this (no sample or barcode information, and no labels)
@M02034:33:000000000-A7RA6:1:1101:13173:1946
I'm not sure how to format the headers in my FASTQ files properly. I looked for flags to pass in the Qiime scripts I ran, but I didn't see anything that was relevant to this particular issue. I'm sure this can be done with code, but I'm new to coding and the command line, and the process for editing the sequence headers isn't clear to me. Is there something I'm forgetting, or just not considering? For anyone who has used Qiime to process data for SRA upload, did you do anything differently?