Priority list for Debugging EcoCyc

2 views
Skip to first unread message

zuc...@research.dfci.harvard.edu

unread,
Apr 20, 2005, 6:06:36 PM4/20/05
to Debuggin...@googlegroups.com
Hello folks,

To summarize the conversation with Markus Krummenacker, it seems we
all agree on the problems at hand:

1. Biomass composition should be recorded.
2. Some reactions are not mass balanced:
3. Some reactions atomically balanced for C, S, N, and P: 694
4. Some reactions contain generic metabolites that can cannot be
resolved to instances
5. Some reactions contain generic metabolites that can be resolved to
specific instances.
6. Some metabolites can be produced, and not consumed.
7. Vice versa

The question is: how many reactions and/or metabolites are affected by
each problem?

To answer this question, we need to run some queries against the EcoCyc
database.

More on this in the next email.

Jeremy

Jeremy Zucker

unread,
Apr 20, 2005, 6:37:13 PM4/20/05
to Debuggin...@googlegroups.com

On Apr 20, 2005, at 6:06 PM, zuc...@research.dfci.harvard.edu wrote:

>
> Hello folks,
>
> To summarize the conversation with Markus Krummenacker, it seems we
> all agree on the problems at hand:
>
> 1. Biomass composition should be recorded.

Attached in biomass.par is the Biomass composition from Bernhard
Palsson's model

The columns should read:
Stoichiometry JR904 id EcoCyc id

Note that some of the EcoCyc ID's are missing, unknown, or just wrong.
This needs to get fixed before we can claim we have a working model.
Another possible source of biomass composition could come from the
"Nutrient-related Analysis of a Pathway/Genome database" by Karp and
Romero.
In this article, they describe a list of "essential compounds" that are
necessary for biomass growth, although they do not include relative
proportions.


> 2. Some reactions are not mass balanced:
>



> 3. Some reactions atomically balanced for C, S, N, and P: 694
>

Attached is a list of balanced and unbalanced reactions. The code that
performs the query is included at the top of the file.

>
> The question is: how many reactions and/or metabolites are affected by
> each problem?
>
> To answer this question, we need to run some queries against the EcoCyc
> database.
>

The functions that must be loaded in order for the queries to work are
also attached in debugging-the-bug.lisp

To load the functions, type (load "debugging-the-bug") into
pathway-tools


> More on this in the next email.
>
> Jeremy
>
>
Jeremy Zucker
Bioinformatics Specialist
Dana-Farber Cancer Institute
url: http://research.dfci.harvard.edu
email: zuc...@research.dfci.harvard.edu
work: 617-632-6852
cell: 617-833-3196
biomass.par
debugging-the-bug.lisp
balanced-rxns.zip

Xiaoxia (Nina) Lin

unread,
Apr 21, 2005, 1:39:21 AM4/21/05
to Debuggin...@googlegroups.com
I wonder whether similar efforts are also being spent on the MetaCyc. We
are developing methods for constructing metabolic flux models from genome
annotations using the MetaCyc database. So it'll be really helpful if
MetaCyc can be improved on the same aspects.

Thanks,
Xiaoxia (Nina)

Markus Krummenacker

unread,
Apr 21, 2005, 11:05:07 PM4/21/05
to Debuggin...@googlegroups.com, Pedro Romero, k...@ai.sri.com
Jeremy Zucker writes:
> > Hello folks,

Thanks Jeremy, for setting up this mailing list.


> > To summarize the conversation with Markus Krummenacker, it seems we
> > all agree on the problems at hand:
> >
> > 1. Biomass composition should be recorded.
>
> Attached in biomass.par is the Biomass composition from Bernhard
> Palsson's model
>
> The columns should read:
> Stoichiometry JR904 id EcoCyc id
>
> Note that some of the EcoCyc ID's are missing, unknown, or just
> wrong.
> This needs to get fixed before we can claim we have a working model.

As far as I can see from this list, the only 2 items that are not
present in EcoCyc are:

-0.0084000 LPS liposaccarides_genes_rfa_reactions_missing_from_Biocyc
+1.000000 Biomass BIOMASS

The LPS stuff is actually an interesting question. As far as I know,
the biochemistry of their synthesis is not yet fully worked out and is
still under investigation. So I'd be interested in knowing how
Palsson dealt with this in his model...


> Another possible source of biomass composition could come from the
> "Nutrient-related Analysis of a Pathway/Genome database" by Karp and
> Romero.
> In this article, they describe a list of "essential compounds" that are
> necessary for biomass growth, although they do not include relative
> proportions.

So Pedro, where exactly did you get the information from, for the
essential cpd list ?

One would think that by now, there would be some literature on the
topic, maybe even some kind of a review. (I haven't searched for
anything yet myself.)

--
--
Regards
Markus Krummenacker

Markus Krummenacker

unread,
Apr 21, 2005, 11:08:03 PM4/21/05
to Debuggin...@googlegroups.com
Xiaoxia (Nina) Lin writes:
>
> I wonder whether similar efforts are also being spent on the MetaCyc. We
> are developing methods for constructing metabolic flux models from genome
> annotations using the MetaCyc database. So it'll be really helpful if
> MetaCyc can be improved on the same aspects.


I think we will first focus on EcoCyc, because this seems closest to
complete. At some later point, we'd obviously be interested in
extending this to everything else.

Pedro R. Romero

unread,
Apr 22, 2005, 2:44:09 PM4/22/05
to Markus Krummenacker, Debuggin...@googlegroups.com, Pedro Romero
Markus Krummenacker wrote:
(snip)...


 > Another possible source of biomass composition could come from the 
 > "Nutrient-related Analysis of a Pathway/Genome database" by Karp and 
 > Romero.
 > In this article, they describe a list of "essential compounds" that are 
 > necessary for biomass growth, although they do not include relative 
 > proportions.

So Pedro, where exactly did you get the information from, for the
essential cpd list ?

One would think that by now, there would be some literature on the
topic, maybe even some kind of a review.  (I haven't searched for
anything yet myself.)

  
We defined the essential compound list ourselves. It was simply a matter of including the building blocs for everything the cell needs in order to survive: For example, amino acids, nucleotides and nucleosides as building blocs for proteins and nucleic acids, as well as building blocs for the cell membrane and the outer cell wall. To get these latter building blocs I mainly used information from the E. coli and Salmonella "bible" (Neidhardt, ed. Escherichia coli and Salmonella:Cellular and Molecular Biology, ASM Press)

Our original work was not quantitative: We wanted to make a qualitative assesment of which species needed to be present in the environment to be able to produce these essential compounds given the reaction network present in EcoCyc. Thus, there were no indications of proportions, and we did not need to use stoichiometric information. We just assumed that a metabolite could be produced if the required reactants and cofactores were present at any moment.

Hope this clarifies things...

Cheers,

Pedro
Reply all
Reply to author
Forward
0 new messages