Formatting Demultiplexed FASTQ files for NCBI SRA Submission

589 views
Skip to first unread message

zabe...@ucdavis.edu

unread,
Nov 9, 2015, 6:52:54 PM11/9/15
to Qiime 1 Forum
Hello Qiime community,

I need to upload some community 16S and ITS sequencing data to the SRA, which requires demultiplexed FASTQ files for submission.  My data came from an Illumina MiSeq using a paired-end protocol, and the yeast and bacteria raw data are contained in R1 and R2 FASTQ files.  I first joined paired ends, demultiplexed the joined sequences file using split libraries, then ran the script make_fastq.py to generate a FASTQ file for each library (sample) in my dataset.  However, I received errors when I tried to upload these files.  I talked to an SRA curator who noticed that the FASTQ headers aren't formatted correctly.  

My FASTQ sequence headers look like this:
@81_1000000399 read_id=M02034:33:000000000-A7RA6:1:1101:13173:1946 barcode=CGGCATAT

But SRA needs them to look like this (no sample or barcode information, and no labels) 
@M02034:33:000000000-A7RA6:1:1101:13173:1946

I'm not sure how to format the headers in my FASTQ files properly.  I looked for flags to pass in the Qiime scripts I ran, but I didn't see anything that was relevant to this particular issue.  I'm sure this can be done with code, but I'm new to coding and the command line, and the process for editing the sequence headers isn't clear to me.  Is there something I'm forgetting, or just not considering?  For anyone who has used Qiime to process data for SRA upload, did you do anything differently?

Kyle Bittinger

unread,
Nov 10, 2015, 10:28:34 AM11/10/15
to Qiime 1 Forum
I'm consulting with someone in my lab who does SRA submissions regularly.  He's off today, so it may be more than 24h before we can get back to you.  However, I'll be sure to get his protocol to you within 2 days.

Some other QIIME folks have had luck submitting to the European agency, ENA.

zabe...@ucdavis.edu

unread,
Nov 16, 2015, 12:59:09 PM11/16/15
to Qiime 1 Forum
Hi Kyle,

Were you able to obtain a protocol?

Antonio González Peña

unread,
Nov 17, 2015, 9:19:58 AM11/17/15
to Qiime 1 Forum
Hello,

Sorry for the slow reply.

We haven't submitted to SRA in a really long time due to complicated submissions and retrievals. With this in mind, we don't have a clear path to help you format your files. Except that perhaps you could remove all non required fields form the header with sed?

Now, we use Qiita (http://qiita.microbio.me/) to submit to EBI/ENA. The process is really simple, you create a new study, create a sample and prep template (or can submit a Qiime mapping file -- more info on required fields here: http://qiita.microbio.me/static/doc/html/tutorials/prepare-templates.html, pay special attention to EBI submission required fields). Once you have valid and processed sample and prep templates, you can add your raw data (ideally just the barcodes, forward and reverse reads but you could also upload barcodes and your pair end data), preprocess it (split libs) and process it (pick otus). Finally, you can request to make it private and to be submitted to EBI - the system takes care of making all the submissions and when you requested make it public in both sites. More info on this process can be found here: http://qiita.microbio.me/static/doc/html/tutorials/getting-started.html

Finally, if you have any issues you can send an email to qiita...@gmail.com.

Hope this helps. 
Reply all
Reply to author
Forward
0 new messages