Duplicate ID found in FASTA/qual file

DBH

unread,

Jan 21, 2015, 3:10:31 PM1/21/15

to qiime...@googlegroups.com

Hi,

I have a fasta and a qual file from the sequencing center as they joined the paired end reads together for us (MiSeq reads).

I ran the split_libraries.py command qiime@qiime-VirtualBox:~$ split_libraries.py -f illuminaseq.fasta -q illuminaseq.qual -m mappingfile.txt -o split_library_output -b

The output produces:

"Duplicate ID found in FASTA/qual file: %s" % label

ValueError: Duplicate ID found in FASTA/qual file: M02542:8:000000000-A7F0J:1:1102:17729:24017 1:N:0:1

I ran the grep -c command ~$ grep -c "M02542:8:000000000-A7F0J:1:1102:17729:24017" illuminaseq.fasta with the result being "2".

Is there a way to remove one of these IDs from the fasta file?

Thanks

zhenjiang zech xu

unread,

Jan 21, 2015, 5:05:17 PM1/21/15

to qiime-forum

I would look into why there are duplicates. just to make sure there are no errors from end joining. normally illumina don't generate that. But if you just wanna to push thru the analysis pipeline, you can just change one of the IDs to a different one. Make sure you change the qual file too.

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

DBH

unread,

Jan 22, 2015, 6:03:30 PM1/22/15

to qiime...@googlegroups.com

Thanks. I think I will try and join the two paired reads myself rather than rely on the joined reads sent from the sequencing center.

zhenjiang zech xu

unread,

Jan 23, 2015, 2:24:42 AM1/23/15

to qiime-forum

Another thing, it seems weird to us that you are using split_libraries.py instead of split_libraries_fastq.py for illumina reads. And you didn't provide parameter for -b option. If your reads are already demultiplexed, you can use multiple_split_libraries_fastq.py instead.

DBH

unread,

Jan 23, 2015, 12:50:24 PM1/23/15

to qiime...@googlegroups.com

Yes, after doing some more reading I realized that I should be using split_libraries_fastq.py rather than split_libraries.py for Illumina data. I ended up extracting the barcodes from the fastq file using the extract_barcodes.py command to get the barcodes file.

Reply all

Reply to author

Forward