multiple convert fastq to fasta ?

801 views
Skip to first unread message

Yoann Perrin

unread,
Jun 1, 2016, 8:49:49 AM6/1/16
to Qiime 1 Forum
hi guys,
I performed multiple 16S amplicon sequencing on MiSeq Illumina.
I want convert my multiple fastq in fasta and qual
Actually, i do it with convert_fastaqual_fastq.py, but i'm a little bit lazy and i would like to realize it with only one script like "multiple_convert_fastaqual_fastq.py".
Is there this kind of script ?

Thanks for your answers

abir...@gmail.com

unread,
Jun 1, 2016, 5:51:59 PM6/1/16
to Qiime 1 Forum
Hi, Yoann,
I'm not aware of such a script, but it shouldn't be terribly hard to create one if you or someone you have access to can do a little python scripting.  Basically, I think all you'd need to do is create a list of your fastq file names and then loop over the list, calling convert_fastaqual_fastq  for each one.
Best,
Amanda

TonyWalters

unread,
Jun 2, 2016, 1:33:07 PM6/2/16
to Qiime 1 Forum
Yoann, might I ask why you want to convert the fastq files to fasta/qual format?

Yoann Perrin

unread,
Jun 3, 2016, 3:22:32 AM6/3/16
to qiime...@googlegroups.com
hi,

I have the feeling that split_libraries_fastq.py is less stringant than split_libraries.py. 
If i'm not mistaken, I can't choose maximum and minimum length of my read, i can't use a sliding window test of quality score. I have degenerated nucleotide in my primer but i can't modify the number of mismatch in primer sequence.
It's for this reason that i want to convert my fastq

TonyWalters

unread,
Jun 3, 2016, 7:05:24 AM6/3/16
to Qiime 1 Forum
split_libraries_fastq.py does do a truncation of the reads when it hits a string of low quality bases, which is similar to the sliding quality window check. There isn't a specific primer check to pull out the primers though.

Can you describe your reads format? Do they all start with the primer sequence (no random bases)? If so, you could use extract_barcodes.py, specify a barcode length matching your primer length, and use the reads that are generated (primers put in a separate read) with split_libraries_fastq.py. http://qiime.org/scripts/extract_barcodes.html

Yoann Perrin

unread,
Jun 3, 2016, 7:44:42 AM6/3/16
to qiime...@googlegroups.com
my reads are like this :

@M00801:223:000000000-AHMCG:1:1101:16430:1982 1:N:0:1
AAGACTCGGCAGCATCTCCACCTACGGGGCGCAGCAGTAGGGAATCTTCCGCAA......
+
ABCBBFFA?ADDGGGGGGGGGGHFHEDCEEFGGGGGHAF.........

In red it's the Illumina adapter (identical to every samples) and in green my primer. The rest is my sequence
Here, initial sequence of primer : 
5'-CCTACGGGRSGCAGCAG-3'

TonyWalters

unread,
Jun 3, 2016, 7:58:05 AM6/3/16
to Qiime 1 Forum
One option would be to run extract_barcodes.py (or multiple_extract_barcodes.py) with the barcode length equal to the length of the adapter+primer above to remove those from the reads, and then the reads with the stripped data could be used with multiple_split_libraries_fastq.py.

It's cleaner in the development version of QIIME, but you'll want to get rid of the extracted "barcodes" files so they aren't detected as reads with multiple_split_libraries_fastq.py (you'll also probably want to use the -w command to see what the --sample_ids are set to, as cleaning these up to only be alphanumeric and period(.) characters would be wise to do to avoid issues later)
You could dump the unjoined data to another folder, and run the multiple_split_libraries_fastq.py command, which should only pick up the single file after this.
Here's an example Linux command to get rid of the barcodes fastq files-I copied from a stitched command, so you'll have to change the "fastqjoin.un*" part to match the barcodes output filenames:
mkdir unjoined_file_dump
find input_dir/ -name "fastqjoin.un*" -print -exec mv {} unjoined_file_dump/ \;
where input_dir is the folder containing all of the subfolders with your joined reads.
Reply all
Reply to author
Forward
0 new messages