Joerg R. Weimar wrote:
> Introducing random variables is useful for many projects.
I agree - it is not supposed to be in any way restricted to discrete
and/or stochastic models.
> I find it
> problematic to introduce only a fixed set of distribution functions. How
> about defining a random variable with a PDF or CDF given by a MathML
> function which can then be defined by the user or compared to a standard
> set of functions.
It is easy to define new distributions. It is not so easy to simulate
random quantities from those distributions. If it is so easy, why
doesn't the GSL allow one to give a pdf and get a random quantity from
it? See later for further details.
> A second problem is that you only included
> one-dimensional distributions, and not multi-dimensional distributions.
The same sort of argument applies here (and again, you will note that
there are no multivariate distributions in the GSL). In general, the
best way to simulate multivariate random quantities is very
problem-specific. However, for well-known families they can often be
simulated quite straightforwardly using univariate quantities, so having
them pre-defined is not such a big deal anyway.
> I think we should look more closely into the MathML and use it to
> specify random variables. For example, the specification mentions the
> types random_variable, continuous_random_variable, and
> discrete_random_variable
>
> so that I think, one can declare
>
> <declare type="discrete_random_variable">
> <ci> X </ci>
> </declare>
>
> or
>
> <ci type="discrete_random_variable"> X </ci>
>
> but I have not found out how the distribution can be declared.
Me neither - as far as I can tell the MathML people haven't really
thought about a sensible way to do this at all. So I think we are pretty
much on our own here.
> In any
> case, I would prefer to have a construct where I can define X as a
> discrete random variable with support {0,1} and
> p(0) = 0.1 and p(1) = 0.9
This is called a Bernoulli distribution. We can easily add it if you
like. It is a special case of the Binomial distribution, which might
also be worth adding. As we say in the proposal, we deliberately tried
to keep the number of distributions to a minimum, in order that people
think seriously about implementing it. eg. the Bernoulli above can be
constructed from a U~UniformRandom(0,1) by setting X to be 1 if U>0.1
and X=0 otherwise. So it isn't strictly necessary.
> And similarly for continuous variables.
>
> X a continuous random variable with support [0,1) and
> p(X=x) = 2 x
>
> (just as an example).
That is a nice example, because you can analytically integrate the PDF
to get the CDF and then analytically invert it to get the transformation
required to sample the distribution by transforming a
UniformRandom(0,1). In general these two steps are not analytically
tractable or numerically efficient. Efficiently simulating random
quantities isn't easy in general - take a look at the GSL source code
for generating samples from the gamma distribution (gsl_ran_gamma) to
see what I mean.
Regards,
--
Darren Wilkinson
email: mailto:
darrenjw...@btinternet.com
work web:
http://www.staff.ncl.ac.uk/d.j.wilkinson/
home web:
http://www.darrenjwilkinson.btinternet.co.uk/