Equivalent of “extract_seqs_by_sample_id.py” function for FASTAQ file?

22 views
Skip to first unread message

Aude Locatelli

unread,
Aug 23, 2017, 1:21:31 PM8/23/17
to Qiime 1 Forum
Hello,

I have a big FASTAQ file containing sequences from 333 samples. From this FASTAQ file, I would like to extract sequences from 108 of these samples.

I found this function “extract_seqs_by_sample_id.py” that looked promising because it allowed me to filter the sequences based on information from the Mapping file:

 

extract_seqs_by_sample_id.py -i inseqs.fasta -o outseqs_by_mapping_field.fasta -m map.txt -s "Treatment:Fast"


However, this function works only with FASTA file and not with FASTAQ file.

Does an equivalent function exist for FASTAQ file?


Thanks a lot.

Colin Brislawn

unread,
Aug 23, 2017, 4:09:50 PM8/23/17
to Qiime 1 Forum
Hello Aude,

Nope, Qiime does not include a script for this.

One thing you could do is convert your fastq file to a fasta file, then use that qiime script.
Anothering thing you could do is use a piece of software (outside of qiime) like seqtk or esl-sfetch to select the reads from these 108 samples. 

I hope this helps,
Colin

Aude Locatelli

unread,
Sep 28, 2017, 8:59:04 PM9/28/17
to Qiime 1 Forum
Hello Colin,

Sorry for the late reply and thanks a lot for your answer.

I converted my FASTAQ file into a FASTA file and then I used the Qiime function “extract_seqs_by_sample_id” on it, but I got an empty file. After wondering why for a while, I got the answer!

In my FASTAQ file, the sequence headlines do not contain any information relative to the sample id but once I demultiplex the samples, they do. So, I use “extract_seqs_by_sample_id” on the seq.fna file obtained after  splitting the library and it works!

Thank you,

Aude

Colin Brislawn

unread,
Sep 29, 2017, 10:16:38 AM9/29/17
to Qiime 1 Forum
Good morning Aude,

I'm glad you found a way to make this work. 

Feel free to open a new thread if you have any more questions.

Colin

Reply all
Reply to author
Forward
0 new messages