dgenome start : Protein is not present

109 views
Skip to first unread message

K Kosnicki

unread,
Jan 29, 2018, 10:31:36 AM1/29/18
to DuctApe
Hi Marco,

Now I am getting this error when running dgenome start. 

Exception in thread Thread-1:

Traceback (most recent call last):

  File "/Users/u12923/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner

    self.run()

  File "/Users/u12923/anaconda2/lib/python2.7/site-packages/ductape/genome/pangenome.py", line 331, in run

    if not self.serialBBH():

  File "/Users/u12923/anaconda2/lib/python2.7/site-packages/ductape/genome/pangenome.py", line 225, in serialBBH

    neworg = self._prot2orgs[otherprotein]

KeyError: u'gnl|fig|1282.2364.peg.412'


09:48:06 - Protein gnl|fig|1282.2364.peg.412 is not present yet!

Traceback (most recent call last):

  File "/Users/u12923/anaconda2/bin/dgenome", line 477, in <module>

    ret = options.func(options, wdir, project)

  File "/Users/u12923/anaconda2/bin/dgenome", line 109, in dstart

    if not doPanGenome(project,infiles,options.cpu,options.prefix,options.matrix,options.evalue):

  File "/Users/u12923/anaconda2/bin/dgenome", line 242, in doPanGenome

    gen.addPanGenome(pang.orthologs)

  File "/Users/u12923/anaconda2/lib/python2.7/site-packages/ductape/storage/SQLite/database.py", line 816, in addPanGenome

    raise Exception('This Protein (%s) is not present yet!'%prot_id)

Exception: This Protein (gnl|fig|1282.2364.peg.412) is not present yet!


I assembled and annotated reads in PATRIC from the genomic sequences in NCBI's SRA archive. From there I was able to obtain the protein sequences, which were used at my genomes. I then submitted that sequence to the KAAS annotation service and used those output for the kaas inputs in DuctApe. Do you know why I might be getting this error?


Best,

Kassi

Marco Galardini

unread,
Jan 29, 2018, 12:31:22 PM1/29/18
to K Kosnicki, DuctApe
Hi Kassi,

this error seems to be due to some undocumented changes in the way blast handles Fasta headers. That will need some changes on my part, hopefully relatively fast. Please accept my apologies for this disruption.

Best,
Marco

--
You received this message because you are subscribed to the Google Groups "DuctApe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ductape-user...@googlegroups.com.
To post to this group, send email to ductap...@googlegroups.com.
Visit this group at https://groups.google.com/group/ductape-users.

Marco Galardini

unread,
Jan 30, 2018, 12:56:21 PM1/30/18
to K Kosnicki, DuctApe
Hi Kassi,

I believe the issue has now been fixed; it seems that newer versions of blast have changed the way they handle protein IDs with pipe (i.e. "|") chars in them. If you can try to run the current master branch we could validate that the fix works for you as well, and I can then push a new version online.

Find the master branch here:
https://github.com/combogenomics/DuctApe

Thanks for spotting this bug.
Best,
Marco

K Kosnicki

unread,
Feb 8, 2018, 3:10:38 PM2/8/18
to DuctApe
Hi Marco,

I'm still getting an error:

Here are the outputs from the "sed" commands you wrote for me:

u12923-2k3:UUI_Loyola_DuctApe u12923$ sed -i 's/fig|//g' proteomes/*.fasta

sed: 1: "proteomes/Bbreve_UMB008 ...": extra characters at the end of p command

u12923-2k3:UUI_Loyola_DuctApe u12923$ sed -i 's/fig|//g' kass_output/*.fasta

sed: 1: "kass_output/*.fasta": invalid command code k



dgenome start error:

u12923-2k3:UUI_Loyola_DuctApe u12923$ dgenome start -n 4


                            Item 1904 on 19460 total                            Exception in thread Thread-1:

Traceback (most recent call last):

  File "/Users/u12923/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner

    self.run()

  File "/Users/u12923/anaconda2/lib/python2.7/site-packages/ductape/genome/pangenome.py", line 331, in run

    if not self.serialBBH():

  File "/Users/u12923/anaconda2/lib/python2.7/site-packages/ductape/genome/pangenome.py", line 225, in serialBBH

    neworg = self._prot2orgs[otherprotein]

KeyError: u'gnl|fig|1282.2364.peg.412'


12:00:27 - Protein gnl|fig|1282.2364.peg.412 is not present yet!

Traceback (most recent call last):

  File "/Users/u12923/anaconda2/bin/dgenome", line 477, in <module>

    ret = options.func(options, wdir, project)

  File "/Users/u12923/anaconda2/bin/dgenome", line 109, in dstart

    if not doPanGenome(project,infiles,options.cpu,options.prefix,options.matrix,options.evalue):

  File "/Users/u12923/anaconda2/bin/dgenome", line 242, in doPanGenome

    gen.addPanGenome(pang.orthologs)

  File "/Users/u12923/anaconda2/lib/python2.7/site-packages/ductape/storage/SQLite/database.py", line 816, in addPanGenome

    raise Exception('This Protein (%s) is not present yet!'%prot_id)

Exception: This Protein (gnl|fig|1282.2364.peg.412) is not present yet!

Jennifer Colquhoun

unread,
Jul 11, 2019, 11:13:06 AM7/11/19
to DuctApe
Hi,

I am having a similar error with the new Blast data output. I have double checked that I have the newest DuctApe loaded. 

Any suggestions?

Thanks,
Jen
ductape.log

Marco Galardini

unread,
Jul 11, 2019, 5:38:42 PM7/11/19
to Jennifer Colquhoun, DuctApe
Hi Jen,

sorry to hear that you are encountering problems with blast. From the look of the error it seems entirely blast's fault. can you try renaming the proteins in your input fasta files? Specifically I think that blast is crashing because it's trying to parse all the "|" characters; if you rename each fasta entry in the form "protein_X" where X is a counter that might work.

Hope this helps,
Marco



--
Marco Galardini

Jen Colquhoun

unread,
Jul 11, 2019, 6:26:30 PM7/11/19
to Marco Galardini, DuctApe
Hi Marco, 

Thanks for the reply. I appreciate that you continue to help users of your program. 

Just to make sure what you're asking....on the fasta file that currently looks like this:

>lcl|000000001|psn|pAB5075UW_000006|chromosomal replication 

You want me to delete each " | " character in the line, so ultimately each protein will look like:

>lcl 000000001 psn pAB5075UW_000006 chromosomal replication 

Thanks,
Jen

Marco Galardini

unread,
Jul 12, 2019, 9:33:35 AM7/12/19
to Jen Colquhoun, DuctApe
Hi,

that would not work well, since the fasta headers are parsed until the first space character; you could try changing the "|" with something like an underscore "_". In the command line you can do that by doing:

sed -i 's/|/_/g'' YOUR_FILE.fasta

Hope this helps,
Marco



--
Marco Galardini

Jen Colquhoun

unread,
Jul 13, 2019, 4:23:48 PM7/13/19
to Marco Galardini, DuctApe
Thanks again! I was able to run the program analysis for my files. 

I did want to ask. It seems like all the Kegg mapping went fine since it produced the several .png maps. However, when I try to open them, my computer states that the file is corrupt. Same with the corresponding .html files. Any suggestions as to why? And how I might be able to fix this?

Jen

Marco Galardini

unread,
Jul 17, 2019, 9:39:51 AM7/17/19
to Jen Colquhoun, DuctApe
Hi,

could you send me one of those files please?

Best,
Marco
--
Marco Galardini

Jen Colquhoun

unread,
Jul 17, 2019, 11:27:31 AM7/17/19
to Marco Galardini, DuctApe
Sure thing. Here’s just a few examples attached that I haven’t been able to open. 
map00010.html
5075O_3Hi.gml
map00010.png

Marco Galardini

unread,
Jul 17, 2019, 12:26:17 PM7/17/19
to Jen Colquhoun, DuctApe
Hi,

the html and gml files look fine (i.e. you can open them with a text editor), but the png is indeed corrupted. there might have been some errors either on the kegg side or some weird glitch by ductape, or some network issue. can you try re-run the offending command and see if the problem persists?

Best,
Marco

On Wed, Jul 17, 2019 at 11:27 AM Jen Colquhoun <jencolq...@yahoo.com> wrote:
Sure thing. Here’s just a few examples attached that I haven’t been able to open. 


--
Marco Galardini
Reply all
Reply to author
Forward
0 new messages