COBRApy beginner: Building a model segment

203 views
Skip to first unread message

Maria de Wet

unread,
Jun 19, 2020, 9:38:03 AM6/19/20
to cobra pie
I want to eventually do a GEM FBA using COBRApy, but right now I'm just trying to consolidate the metabolic flux calculations technique we were taught in school with this environment and am stuck. 
So, given a segment of a metabolic model (image), stoichiometric relations, and product specifications, we'd build a stoichiometric matrix and solve using numpy or pulp or whatever. I followed the instructions to build a model (from https://cobrapy.readthedocs.io/en/latest/building_model.html, adding the metabolites and reactions), but I can't figure out how to add the specified yields/rates, not to get the unknowns. Once I know this works, I will incorporate the info into an existing GEM. The image attached illustrates the brute-force S-matrix type solutions we've been taught and I'm trying to do the same type of thing using COBRApy. 
Does anyone know of a resource that provides a bit more information about building a model from scratch? Or any tips/advice on where to look? 

S_matrix.JPG

Matthias König

unread,
Jun 20, 2020, 6:46:36 AM6/20/20
to cobra pie
The problem should be easily be encoded in cobrapy.
- create the species
- create the reactions (including the biomass reaction)
- set the objective (to the biomass reaction in your model)
- add additional constraints
- optimize

Best you share your code you have so far, then we can give some feedback on where the problem is.

What cobrapy is doing: It builds the LP optimization problem from the stoichiometric constraints and possible flux bounds you define on the reactions.
In addition you can provide additional constraints to the model.
It looks like you have the additional constraints

r1 = 0.3
r5 = 0.5

There are two possibilities to add these:
1. set bounds on r1 and r5
because the additional constraints are only acting on a single reaction you can encode them as flux bounds, effectively setting the flux to a constant value
r1.lower_bound = 0.3
r1.upper_bound = 0.3
r5.lower_bound = 0.5
r5.lower_bound = 0.5

2. you could add these as an additional constraint
For instance if you have additional equations constrainting linear combinations of fluxes
r1 + r5 = 1.0

Setting the flux bounds for r1 and r5 should allow you to get the solution on paper via cobrapy optimize.

Best Matthias

Maria de Wet

unread,
Jun 22, 2020, 10:38:36 AM6/22/20
to cobra pie
Thank you so much for your swift response!
That makes a lot of sense. I managed to get almost the exact same results using your second method as I got from an S-matrix. I now have a whole new set of questions about COBRApy, but this really helped contextualise it for me and I'll definitely ask again if I don't make headway over the next few weeks. 
Attached is my code in case someone else gets stuck on the same problem. A lot of it is sort of forced, and I'm worried that my shoddy execution may lead to issues down the line when I start adding data and working on bigger models, so if you have time, please would you scan through it and give me any tips you can think of?

I really appreciate this so much, thank you Matthias!! 
example1.png
Main_ferm_pathways.ipynb

Maria de Wet

unread,
Jun 22, 2020, 10:41:27 AM6/22/20
to cobra pie
Here is the relevant part of the code outside of a ipynb file in case the reader doesn't like Jupyter Notebooks: 
"""
from __future__ import print_function
from cobra import Model, Reaction, Metabolite
model2 = Model('Attempt_2')
model2.solver = 'glpk'

# Setting up reactions
# Setting up reactions
def get_rxn(reaction):
    rxn = Reaction(reaction)
    rxn.name = reaction
    rxn.lower_bound = 0.  
    rxn.upper_bound = 1000. 
    return rxn 
r0 = get_rxn('Glucose')
r1 = get_rxn('Biomass')
r2 = get_rxn('DHAP')
r3 = get_rxn('Glycerol')
r4 = get_rxn('Pyruvate')
r5 = get_rxn('Lactate')
r6 = get_rxn('Formate')
r7 = get_rxn('AcCoA')
r8 = get_rxn('Acetane')
r9 = get_rxn('Pyr_rxn_2') # This intermediate reaction is troublesome


# Adding reactions to model
model2.add_reactions([r0])
model2.add_reactions([r1])
model2.add_reactions([r2])
model2.add_reactions([r3])
model2.add_reactions([r4])
model2.add_reactions([r5])
model2.add_reactions([r6])
model2.add_reactions([r7])
model2.add_reactions([r8])
model2.add_reactions([r9])


# Adding constraints:
# Glucose == 1.1*Biomass + DHAP
eq1 = model2.problem.Constraint(
    model2.reactions.Glucose.flux_expression - 1.1*model2.reactions.Biomass.flux_expression - model2.reactions.DHAP.flux_expression,
    lb=0,
    ub=0)
# DHAP == Glycerol + Pyruvate
eq2 = model2.problem.Constraint(
    model2.reactions.DHAP.flux_expression - model2.reactions.Glycerol.flux_expression - model2.reactions.Pyruvate.flux_expression,
    lb=0,
    ub=0)
# Pyruvate == Lactate + Pyruvate_for_rxn_2
eq3 = model2.problem.Constraint(
    model2.reactions.Pyruvate.flux_expression - model2.reactions.Lactate.flux_expression - model2.reactions.Pyr_rxn_2.flux_expression,
    lb=0,
    ub=0)
#Formate = (1/3)Pyruvate
eq4 = model2.problem.Constraint(
    model2.reactions.Formate.flux_expression - (1/3)*model2.reactions.Pyruvate.flux_expression,
    lb=0,
    ub=0)

# Pyruvate_for_rxn_2 == Formate + AcCoA
eq5 = model2.problem.Constraint(
    model2.reactions.Pyr_rxn_2.flux_expression - model2.reactions.Formate.flux_expression - model2.reactions.AcCoA.flux_expression,
    lb=0,
    ub=0)
# Let Acetane == AcCoA
eq6 = model2.problem.Constraint(
    model2.reactions.Acetane.flux_expression - model2.reactions.AcCoA.flux_expression,
    lb=0,
    ub=0)
# β*Biomass - (1/3)*Glycerol + (1/3)*Pyruvate - (1/3)*Lactate == 0 # NADH
eq7 = model2.problem.Constraint(
    0.1*model2.reactions.Biomass.flux_expression - (1/3)*model2.reactions.Glycerol.flux_expression + (1/3)*model2.reactions.Pyruvate.flux_expression - (1/3)*model2.reactions.Lactate.flux_expression,
    lb=0,
    ub=0)
# -1.8*Biomass - (1/3)*DHAP + (2/3)*Pyruvate + (1/2)*Acetane == 0.02 #ATP
eq8 = model2.problem.Constraint(
    -1.8*model2.reactions.Biomass.flux_expression - (1/3)*model2.reactions.DHAP.flux_expression + (2/3)*model2.reactions.Pyruvate.flux_expression + (1/2)*model2.reactions.Acetane.flux_expression,
    lb=0.02,
    ub=0.02)
# Biomass == 0.3
eq9 = model2.problem.Constraint(
    model2.reactions.Biomass.flux_expression,
    lb=0.3,
    ub=0.3)
# Lactate == 0.4
eq10 = model2.problem.Constraint(
    model2.reactions.Lactate.flux_expression,
    lb=0.4,
    ub=0.4)
model2.add_cons_vars(eq1)
model2.add_cons_vars(eq2)
model2.add_cons_vars(eq3)
model2.add_cons_vars(eq4)
model2.add_cons_vars(eq5)
model2.add_cons_vars(eq6)
model2.add_cons_vars(eq7)
model2.add_cons_vars(eq8)
model2.add_cons_vars(eq9)
model2.add_cons_vars(eq10)

model2.objective = 'Biomass'
# print('Answers I was hoping for: Glucose 3.56, Biomass 0.3 DHAP 3.23 Glycerol 1.46 Pyruvate 1.77 Lactate 0.4 Formate 1.37 0.45666667 AcCoA 0.91333333')
model2.optimize()

""" 
>>> 
Glucose3.9600000.0
Biomass0.3000000.0
DHAP3.6300000.0
Glycerol1.6600000.0
Pyruvate1.9700000.0
Lactate0.4000000.0
Formate0.6566670.0
AcCoA0.9133330.0
Acetane0.9133330.0
Pyr_rxn_21.5700000.0

Noah Sprent

unread,
Jun 22, 2020, 12:02:35 PM6/22/20
to cobra pie
Hi Maria,

I'm not an expert like Matthias, so he or others might have a better idea, but I think I can offer a suggestion for what is happening. In the matrix that you have set up the masses aren't balanced, so the system at steady state is consuming/producing metabolites. When solving it using the matrix, as you have, or by directly translating this into constraints, as in your 'attempt2', there is no requirement for metabolites to be balanced. In COBRApy, in order for this to be allowed you need to have boundary reactions. (See here: https://cobrapy.readthedocs.io/en/latest/media.html?highlight=boundary#Boundary-reactions). 

Without boundary reactions it isn't possible for your model to run under the constraints that you have set it, because it isn't 'allowed' to take up any metabolites or export any metabolites. E.g. where does the glucose come from? Where does the biomass go to? The only solution is 0 fluxes all around. When you introduce the requirement for rxn1 and rxn3 to have certain fluxes, the model can't fulfil that requirement, and you get an 'infeasible solution' error. The solution is to add boundary reactions for all of your metabolites, and then the model can run. I've added a couple of cells to your .ipynb which illustrate how you can add boundary reactions for every metabolite to enable your system to run. 

You could investigate which metabolites need to be freely exchanging for the system to run, and for which ones adding a 'free' exchange reaction prevents the model running as you'd want it to. If you are running a model where every metabolite is able to increase or decrease freely, including ATP and NADH, you won't get a very helpful result, as this doesn't seem very biologically-relevant. On the other hand, when you then constrain fluxes, as you are doing for rxn1 (the objective) and rxn5, you are defining the solution space very narrowly, and optimising the objective function no longer has any meaning. 

The key is to find a balance, and you can only know this by knowing what the aim of your model is, if you let us know the question you are trying to answer we might be able to help more. Are you looking to see what the biomass yield on glucose is when a certain proportion of carbon has to go to lactate? What is the larger GEM you are hoping to integrate this into?

Hope that helps, and I'm sure others will weigh in if I'm off the mark!
Noah
Main_ferm_pathways.ipynb

Matthias König

unread,
Jun 23, 2020, 6:09:58 AM6/23/20
to cobra pie
Hi Maria,

here some feedback on your solution.

"""
# Building simple example model

## general comments
- don't use python 2, but python 3; get rid of the __future__ imports
- name functions according to what they do; your `get_rxn` function is setting reaction bounds, a much
better name is set_reaction_bounds; provide default arguments which make the function reusable; use
doc strings for commenting functions; type hints are very helpful for functions, people should use these
- use a IDE instead of notebooks; you have a ton of additional functionality which is making your life easier:
e.g.; working git diffs, working code completion; additional hinting (like deprecation); automatic jumping between
modules; ... In my personal opinion notebooks enforce bad programming patterns and style and should be avoided at
all cost for everything which is a real project. The only real use-case are examples, tutorials and documentation.
But if you build a model use a real IDE for that.
- avoid deprecated functions: e.g. model.add_reaction
- use list/vector functions: i.e. model.add_reactions instead of model.add_reaction
- normally you create the metabolites first and then use them to create reactions
- The reactions are you stoichiometric constraints, you don't have to add these manually! I.e. adding Constraints
manually is normally not required.
- All reactions are irreversible (via the bounds). This should work, because your possible flux solution
only has positive values (but this is important to realize)
- some issues with your graph (the glu is defined in the wrong direction, should be internally with respective changes
in the matrix, see reversed glu signs in metabolites; biomass should consume glucose so the r1 needs negative sign)
- you don't have exchange reactions, but just drop stoichiometric coefficients and metabolites from the matrix; this is extremely bad
practise and results in mass/charge balance problems; you need exchange reactions for gly, lac, for, ace (similar to r0
- add a biomass reaction!). If you write a correct metabolic network graph most models are not solvable, but
require additional exchange reactions to allow for balance.
- export your model as SBML! You can see missing information and problems with your model (e.g. compartments are missing);
this is highly recommended so that the model can be exchanged. For instance I used the SBML to create the visualization.
- the graph misses some information which exist in the stoichiometric matrix. Much better to have a clean stoichiometric
model first (metabolite x which does not appear in graph). I encoded to the best of my knowledge, but the given example is not very well specificed.
"""
from typing import Dict

from cobra import Model, Reaction, Metabolite
model = Model('maria_example')
model.solver = 'glpk'

metabolites = [
Metabolite("glu", compartment="cell"),
Metabolite("co2", compartment="cell"),
Metabolite("nadh", compartment="cell"),
Metabolite("atp", compartment="cell"),
Metabolite("biomass", compartment="cell"),
Metabolite("dhap", compartment="cell"),
Metabolite("gly", compartment="cell"),
Metabolite("pyr", compartment="cell"),
Metabolite("lac", compartment="cell"),
Metabolite("for", compartment="cell"),
Metabolite("acoa", compartment="cell"),
Metabolite("x", compartment="cell"),
Metabolite("ace", compartment="cell"),
]
model.add_metabolites(metabolites)


def create_reaction(model: Model, reaction_id: str, metabolites: Dict[str, float],
lower_bound: float = 0.0, upper_bound: float=1000.0):
""" Setting reaction bounds for reaction.

By default irreversible bounds.

:param reaction_id:
:param lower_bound:
:param upper_bound:
:return: cobra.reaction
"""
rxn = Reaction(reaction_id)
model.add_reactions([rxn])
rxn.name = reaction_id
rxn.lower_bound = lower_bound
rxn.upper_bound = upper_bound
rxn.add_metabolites(metabolites)
return rxn

EX_glu = create_reaction(model, 'EX_glu', {"glu": 1.0})
r1 = create_reaction(model, 'r1', {"glu": -1.1, "atp": -1.8, "biomass": 1.0, "co2": 0.1})
r2 = create_reaction(model, 'r2', {"glu": -1.0, "atp": -1.0/3.0, "dhap": 1.0})
r3 = create_reaction(model, 'r3', {"dhap": -1.0, "nadh": -1.0/3.0, "gly": 1.0})
r4 = create_reaction(model, 'r4', {"dhap": -1.0, "pyr": 1.0, "atp": 2.0/3.0, "nadh": 1.0/3.0})
r5 = create_reaction(model, 'r5', {"pyr": -1.0, "nadh": -1.0/3.0, "lac": 1.0, })
r6 = create_reaction(model, 'r6', {"pyr": -1.0, "x": -1.0/3.0, "acoa": 1.0})
r7 = create_reaction(model, 'r7', {"acoa": -1.0, "x": 1.0, "for": 1.0})
r8 = create_reaction(model, 'r8', {"acoa": -1.0, "ace": 1.0, "atp": 1.0/2.0})
EX_co2 = create_reaction(model, 'EX_co2', {"co2": -1.0})
EX_gly = create_reaction(model, 'EX_gly', {"gly": -1.0})
EX_lac = create_reaction(model, 'EX_lac', {"lac": -1.0})
EX_for = create_reaction(model, 'EX_for', {"for": -1.0})
EX_ace = create_reaction(model, 'EX_ace', {"ace": -1.0})
rbiomass = create_reaction(model, 'rbiomass', {"biomass": -1.0})

model.objective = 'rbiomass'
# additional constraints
r1.lower_bound = 0.3
r1.upper_bound = 0.3
r5.lower_bound = 0.4
r5.upper_bound = 0.4

# store model as SBML for visualization
from cobra.io.sbml import write_sbml_model
write_sbml_model(model, "maria_example.xml", f_replace=None)

sol = model.optimize()
print(sol)
print(sol.status)
print(sol.fluxes)

# FVA
from cobra.flux_analysis import flux_variability_analysis
fva = flux_variability_analysis(model)
print(fva)


This gives a results of
Name: fluxes, dtype: float64
           minimum   maximum
EX_glu    3.170000  3.170000
r1        0.300000  0.300000
r2        2.840000  2.840000
r3        1.220000  1.220000
r4        1.620000  1.620000
r5        0.400000  0.400000
r6        1.220000  1.220000
r7        0.406667  0.406667
r8        0.813333  0.813333
EX_co2    0.030000  0.030000
EX_gly    1.220000  1.220000
EX_lac    0.400000  0.400000
EX_for    0.406667  0.406667
EX_ace    0.813333  0.813333
rbiomass  0.300000  0.300000

Please check the stoichiometry of the reactions if this is what was meant in the model. This is the only point were problems could exist.

best Matthias

On Friday, June 19, 2020 at 3:38:03 PM UTC+2, Maria de Wet wrote:
Base: maria_example.png
maria_example.xml
maria_example.xml

Maria de Wet

unread,
Jun 23, 2020, 10:53:15 AM6/23/20
to Matthias König, cobra pie

Matthias,

I honestly don't know how to thank you. This is immeasurably helpful. 
I'm still combing through your code line by line to make sure I understand exactly what everything does before I fiddle with the stoichiometry. But working from such a beautiful basis is going to prevent a plethora of compounding bad practice coding that would probably have resulted in months' worth of issues along the line! Your advice is also extremely good -- some of which I immediately applied (like specifying function names, and I will definitely need to brush up on PEP 8 again!!), others I will absolutely start with when I'm comfortable enough (like trying an IDE), and the rest really helps me understand how you managed to write everything so elegantly. Again, thank you! 

Lastly: Is there a more updated source of info for COBRApy than ReadTheDocs so that I can avoid deprecated functions? And maybe a place where I can look up naming conventions (I'm not sure for example why we use EX_ for boundary reactions)? 

Noah: Thank you for your input! I completely forgot about that insignificant little "conservation of mass" detail... Anyway, thanks. So I spent the whole day tinkering with the boundary reactions and getting SOMETHING! Yay! But after looking at Matthias' code I realised everything else was so riddled with issues that it's a small wonder it didn't result in anything useful (expect perhaps marginally improving my understanding and level of comfort with COBRApy). This little metabolic segment isn't actually what I'm working with, it was just a tutorial in my studies (chemical engineering) that I understand relatively well and wanted to use it to familiarise myself with COBRApy. What I'm working towards for my postgraduate studies (that only started this year) is cobalamin (vitamin B12) synthesis in eukaryotes through genetic engineering (the 15 year plan is to have a nutritionally complete plant that Mat Damon can live off indefinitely on Mars). Biosynthesis of B12 is a relatively complex process (for a vitamin) involving about 20 genes and a lot of resources (e.g. it requires 10 molecules of ATP/ molecule of adenosylcobalamin). So I had this idea that I could use GEM (of S. cerevisiae for example) to model proposed gene insertions to figure out how resources could theoretically be reallocated for this to happen. My starting point is to try to model the GEM for E. coli in which the heterogeneous genes have been inserted (which has been physically achieved)* before moving on to my own theoretical B12pathway-inserted-GEM. I don't know whether COBRApy is right for this, but all my research suggests it's definitely the best open-source python based tool available... And it clearly has an AMAZING community!








--
You received this message because you are subscribed to a topic in the Google Groups "cobra pie" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cobra-pie/NY0kchbjffs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cobra-pie+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cobra-pie/800489f2-bc86-47f0-b813-fb30f6e137dco%40googlegroups.com.


--
Kind regards

Dr Maria MM de Wet
082 418 7722
Reply all
Reply to author
Forward
0 new messages