Re: Proposal for distributions within SBML

Joerg R. Weimar

unread,

May 6, 2005, 4:37:19 AM5/6/05

to SBML Discussion List

Hi Colin,

Introducing random variables is useful for many projects. I find it
problematic to introduce only a fixed set of distribution functions. How
about defining a random variable with a PDF or CDF given by a MathML
function which can then be defined by the user or compared to a standard
set of functions. A second problem is that you only included
one-dimensional distributions, and not multi-dimensional distributions.

I think we should look more closely into the MathML and use it to
specify random variables. For example, the specification mentions the
types random_variable, continuous_random_variable, and
discrete_random_variable

so that I think, one can declare

<declare type="discrete_random_variable">
<ci> X </ci>
</declare>

or

<ci type="discrete_random_variable"> X </ci>

but I have not found out how the distribution can be declared. In any
case, I would prefer to have a construct where I can define X as a
discrete random variable with support {0,1} and
p(0) = 0.1 and p(1) = 0.9

And similarly for continuous variables.

X a continuous random variable with support [0,1) and
p(X=x) = 2 x

(just as an example).

Best regards, Jörg Weimar.

Colin Gillespie wrote:
> Hi All,
>
> Since the list has been relatively quite in the past few days ;) I thought that now would be the time to request thoughts on this proposal for distributions within SBML. There a number issues still to be resolved, so hopefully I'll get lots of feedback on the document.
>
> I'm at the Hackathon, so feel free to flame me in person ;)
>
> The proposal can be found at http://www.basis.ncl.ac.uk/distributions.pdf
>
> Thanks
>
> Colin
>
>
>
>
>
>
>

--
Privatdozent Dr. Jörg R. Weimar, Arbeitsgruppe Bioinformatik,
Institut für Informationssysteme, TU-Braunschweig
J.We...@tu-bs.de, http://www.jweimar.de
Tel. +49-531-2096237 Mühlenpfordtstr.23 D-38106 Braunschweig

Colin Gillespie

unread,

May 7, 2005, 11:05:55 PM5/7/05

to SBML Discussion List

Hi All,

Thanking about distributions a bit more - during a long stroll in a Tokyo garden ;) What about trying something like this?

The proposal still only includes a small number of distributions - perhaps a few more than has been defined, but small nevertheless.

Then additional distributions could be defined using a combination of a controlled vocabulary.

For example, the geometric distribution

<functionDefinition id="geometricDistribution">
<some mathML>
</functionDefinition>

Now the idea would be if your tool supported the CV for distributions it would know what a geometric distribution is and implement its own method of calculating it.

If you tool didn't bother with the CV then it could still calculate the required quantity using the functionDefinition provided.

Obviously, where the CV terms a stored an associated function definition is stored along side it.

Just some thoughts.

Colin

Howard Salis

unread,

May 5, 2005, 10:44:28 PM5/5/05

to SBML Discussion List

Hello,
I think this would be very useful, especially for representing cell
division events and transcriptional & translational elongation events.

One suggestion: Maybe include the parameter name alongside the
parameter? Something like

<math xmlns="http://www.w3.org/1998/Math/MathML">
<csymbol encoding="text"
definitionURL="http://www.sbml.org/sbml/symbols/uniformRandom">
<cn name = "a">0</cn>
<cn name = "b">1</cn>
</csymbol>
</math>

Just to prevent confusion if the numbers are somehow switched in order.
(I think all parameters should have some name.)

Thanks,

-Howard Salis

Darren Wilkinson

unread,

May 6, 2005, 4:35:10 PM5/6/05

to SBML Discussion List

Joerg R. Weimar wrote:
> Introducing random variables is useful for many projects.

I agree - it is not supposed to be in any way restricted to discrete
and/or stochastic models.

> I find it
> problematic to introduce only a fixed set of distribution functions. How
> about defining a random variable with a PDF or CDF given by a MathML
> function which can then be defined by the user or compared to a standard
> set of functions.

It is easy to define new distributions. It is not so easy to simulate
random quantities from those distributions. If it is so easy, why
doesn't the GSL allow one to give a pdf and get a random quantity from
it? See later for further details.

> A second problem is that you only included
> one-dimensional distributions, and not multi-dimensional distributions.

The same sort of argument applies here (and again, you will note that
there are no multivariate distributions in the GSL). In general, the
best way to simulate multivariate random quantities is very
problem-specific. However, for well-known families they can often be
simulated quite straightforwardly using univariate quantities, so having
them pre-defined is not such a big deal anyway.

> I think we should look more closely into the MathML and use it to
> specify random variables. For example, the specification mentions the
> types random_variable, continuous_random_variable, and
> discrete_random_variable
>
> so that I think, one can declare
>
> <declare type="discrete_random_variable">
> <ci> X </ci>
> </declare>
>
> or
>
> <ci type="discrete_random_variable"> X </ci>
>
> but I have not found out how the distribution can be declared.

Me neither - as far as I can tell the MathML people haven't really
thought about a sensible way to do this at all. So I think we are pretty
much on our own here.

> In any
> case, I would prefer to have a construct where I can define X as a
> discrete random variable with support {0,1} and
> p(0) = 0.1 and p(1) = 0.9

This is called a Bernoulli distribution. We can easily add it if you
like. It is a special case of the Binomial distribution, which might
also be worth adding. As we say in the proposal, we deliberately tried
to keep the number of distributions to a minimum, in order that people
think seriously about implementing it. eg. the Bernoulli above can be
constructed from a U~UniformRandom(0,1) by setting X to be 1 if U>0.1
and X=0 otherwise. So it isn't strictly necessary.

> And similarly for continuous variables.
>
> X a continuous random variable with support [0,1) and
> p(X=x) = 2 x
>
> (just as an example).

That is a nice example, because you can analytically integrate the PDF
to get the CDF and then analytically invert it to get the transformation
required to sample the distribution by transforming a
UniformRandom(0,1). In general these two steps are not analytically
tractable or numerically efficient. Efficiently simulating random
quantities isn't easy in general - take a look at the GSL source code
for generating samples from the gamma distribution (gsl_ran_gamma) to
see what I mean.

Regards,

--
Darren Wilkinson
email: mailto:darrenjw...@btinternet.com
work web: http://www.staff.ncl.ac.uk/d.j.wilkinson/
home web: http://www.darrenjwilkinson.btinternet.co.uk/

Darren Wilkinson

unread,

May 7, 2005, 4:13:54 AM5/7/05

to SBML Discussion List

Howard Salis wrote:
> I think this would be very useful, especially for representing cell
> division events and transcriptional & translational elongation events.

Yes, I agree - I think this feature will be useful in a variety of contexts.

> One suggestion: Maybe include the parameter name alongside the
> parameter? Something like
>
> <math xmlns="http://www.w3.org/1998/Math/MathML">
> <csymbol encoding="text"
> definitionURL="http://www.sbml.org/sbml/symbols/uniformRandom">
> <cn name = "a">0</cn>
> <cn name = "b">1</cn>
> </csymbol>
> </math>
>
> Just to prevent confusion if the numbers are somehow switched in order.
> (I think all parameters should have some name.)

I know exactly what you mean, but as far as I know, this isn't really
the MathML way of doing things. However, I'd be the first to admit that
we aren't really MathML experts - what do others think?

Cheers,

Colin Gillespie

unread,

Sep 4, 2016, 9:11:58 PM9/4/16

to sbml-d...@googlegroups.com, sbml-d...@caltech.edu

Reply all

Reply to author

Forward