Extract mapped sequences classified with GOTTCHA viral database

Scott

unread,

Oct 15, 2018, 10:24:25 AM10/15/18

to edge-users

Hello,

I am trying to extract mapped sequences from the various taxonomy classification tools in the EDGE. To date, all but one have extracted successfully. I am interested in extracting the reads that mapped to Enterobacteria phage M13 using GOTTCHA viral database. When doing so, I get the following error:

The requested URL /EDGE_output//c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/Plasmid_gDNA_Mix3_gottcha-speDB-v_Enterobacteria_phage_M13.fastq.zip was not found on this server.

The script generated by the EDGE is:

/home/edge/edge/scripts/microbial_profiling/script/bam_to_fastq_by_taxa.pl -rank species -name "Enterobacteria phage M13" -prefix /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/Plasmid_gDNA_Mix3_gottcha-speDB-v_Enterobacteria_phage_M13 -se -zip -fastq /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/allReads.fastq /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/allReads-gottcha-speDB-v.bam

I have extracted reads mapped to other viruses by GOTTCHA from the same project. For example, sequences mapped to Bacillus phage phBC6A52 were successfully extracted. The generated script was:

/home/edge/edge/scripts/microbial_profiling/script/bam_to_fastq_by_taxa.pl -rank species -name "Bacillus phage phBC6A52" -prefix /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/Plasmid_gDNA_Mix3_gottcha-speDB-v_Bacillus_phage_phBC6A52 -se -zip -fastq /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/allReads.fastq /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/allReads-gottcha-speDB-v.bam

The only difference I can see in the script is "Enterobacteria phage M13" vs"Bacillus phage phBC6A52"

Any help would be appreciated.

Thanks,

Scott

Lo, Chien-Chi

unread,

Oct 15, 2018, 6:05:56 PM10/15/18

to Scott, edge-users

Hi Scott,

This is the issue of database inconsistency. The NCBI genome database has the M13 name “Enterobacteria phage M13” in the genome fasta/genbank but name “Escherichia virus M13” in the taxonomy database as Scientific name.

The result of the GOTTCHA is based on the name in the fasta/genbank file but the extraction script is to get the name from the NCBI taxonomy database and check the name match with the result of GOTTCHA.

https://www.ncbi.nlm.nih.gov/nuccore/NC_003287.2

This type of issue usually happened in the Virus phylum. I thinks this type of issue has be resolved in the GOTTCHA2 but I will contact the developer the make sure this is the true statement.

Thanks,

Chienchi

--
You received this message because you are subscribed to the Google Groups "edge-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edge-users+...@googlegroups.com.
To post to this group, send email to edge-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/edge-users/2fe40d8c-e9f9-46a7-b853-b3d680b8e91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott

unread,

Oct 16, 2018, 1:22:07 PM10/16/18

to edge-users

Thank you.

To post to this group, send email to edge...@googlegroups.com.

Reply all

Reply to author

Forward