Extract mapped sequences classified with GOTTCHA viral database

11 views
Skip to first unread message

Scott

unread,
Oct 15, 2018, 10:24:25 AM10/15/18
to edge-users
Hello,
I am trying to extract mapped sequences from the various taxonomy classification tools in the EDGE.  To date, all but one have extracted successfully.  I am interested in extracting the reads that mapped to Enterobacteria phage M13 using GOTTCHA viral database.  When doing so, I get the following error:

The requested URL /EDGE_output//c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/Plasmid_gDNA_Mix3_gottcha-speDB-v_Enterobacteria_phage_M13.fastq.zip was not found on this server. 

The script generated by the EDGE is:

/home/edge/edge/scripts/microbial_profiling/script/bam_to_fastq_by_taxa.pl -rank species  -name "Enterobacteria phage M13" -prefix /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/Plasmid_gDNA_Mix3_gottcha-speDB-v_Enterobacteria_phage_M13 -se -zip  -fastq /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/allReads.fastq  /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/allReads-gottcha-speDB-v.bam

I have extracted reads mapped to other viruses by GOTTCHA from the same project. For example, sequences mapped to Bacillus phage phBC6A52 were successfully extracted.  The generated script was:

/home/edge/edge/scripts/microbial_profiling/script/bam_to_fastq_by_taxa.pl -rank species  -name "Bacillus phage phBC6A52" -prefix /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/Plasmid_gDNA_Mix3_gottcha-speDB-v_Bacillus_phage_phBC6A52 -se -zip  -fastq /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/allReads.fastq  /home/edge/EDGE_output/c4b986b78eab9f7598df29d2f4ad1dda/ReadsBasedAnalysis/Taxonomy/report/1_allReads/gottcha-speDB-v/allReads-gottcha-speDB-v.bam

The only difference I can see in the script is "Enterobacteria phage M13" vs"Bacillus phage phBC6A52"
Any help would be appreciated.

Thanks,
Scott

Lo, Chien-Chi

unread,
Oct 15, 2018, 6:05:56 PM10/15/18
to Scott, edge-users

Hi Scott,

 

This is the issue of database inconsistency. The NCBI genome database has the M13 name “Enterobacteria phage M13” in the genome fasta/genbank but name “Escherichia virus M13” in the taxonomy database as Scientific name.

 

The result of the GOTTCHA is based on the name in the fasta/genbank file but the extraction script is to get the name from the NCBI taxonomy database and check the name match with the result of GOTTCHA.

 

https://www.ncbi.nlm.nih.gov/nuccore/NC_003287.2

 

This type of issue usually happened in the Virus phylum.  I thinks this type of issue has be resolved in the GOTTCHA2 but I will contact the developer the make sure this is the true statement.

 

Thanks,

Chienchi

--
You received this message because you are subscribed to the Google Groups "edge-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edge-users+...@googlegroups.com.
To post to this group, send email to edge-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/edge-users/2fe40d8c-e9f9-46a7-b853-b3d680b8e91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott

unread,
Oct 16, 2018, 1:22:07 PM10/16/18
to edge-users

Thank you.

To post to this group, send email to edge...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages