Equivalent of “extract_seqs_by_sample_id.py” function for FASTAQ file?

Aude Locatelli

unread,

Aug 23, 2017, 1:21:31 PM8/23/17

to Qiime 1 Forum

Hello,

I have a big FASTAQ file containing sequences from 333 samples. From this FASTAQ file, I would like to extract sequences from 108 of these samples.

I found this function “extract_seqs_by_sample_id.py” that looked promising because it allowed me to filter the sequences based on information from the Mapping file:

extract_seqs_by_sample_id.py -i inseqs.fasta -o outseqs_by_mapping_field.fasta -m map.txt -s "Treatment:Fast"

However, this function works only with FASTA file and not with FASTAQ file.

Does an equivalent function exist for FASTAQ file?

Thanks a lot.

Colin Brislawn

unread,

Aug 23, 2017, 4:09:50 PM8/23/17

to Qiime 1 Forum

Hello Aude,

Nope, Qiime does not include a script for this.

One thing you could do is convert your fastq file to a fasta file, then use that qiime script.

Anothering thing you could do is use a piece of software (outside of qiime) like seqtk or esl-sfetch to select the reads from these 108 samples.

https://github.com/lh3/seqtk

https://cryptogenomicon.org/2011/05/03/extracting-hmmer-results-to-sequence-files-easel-miniapplications/

I hope this helps,

Colin

Aude Locatelli

unread,

Sep 28, 2017, 8:59:04 PM9/28/17

to Qiime 1 Forum

Hello Colin,

Sorry for the late reply and thanks a lot for your answer.

I converted my FASTAQ file into a FASTA file and then I used the Qiime function “extract_seqs_by_sample_id” on it, but I got an empty file. After wondering why for a while, I got the answer!

In my FASTAQ file, the sequence headlines do not contain any information relative to the sample id but once I demultiplex the samples, they do. So, I use “extract_seqs_by_sample_id” on the seq.fna file obtained after splitting the library and it works!

Thank you,

Aude

Colin Brislawn

unread,

Sep 29, 2017, 10:16:38 AM9/29/17

to Qiime 1 Forum

Good morning Aude,

I'm glad you found a way to make this work.

Feel free to open a new thread if you have any more questions.

Colin

Reply all

Reply to author

Forward