--
Lionel Roubeyrie
lionel.r...@gmail.com
http://youarealegend.blogspot.com
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Hey,
For the last few weeks, there has been an ungoing effort to develop
Gaussian Processes (which are another name for Kriging) in the
scikit-learn (http://scikit-learn.sourceforge.net/). The initial code has
recieved a lot of comments and subsequent work:
https://github.com/scikit-learn/scikit-learn/pull/14
The scikit-learn is current a very active project, with many contributors
that share a good expertise on machine learning and computational
statistics issue, a high standard for code, and frequent releases. It
would be really great if people who have usage or knowledge (or both) of
Gaussian processes could join in the discussion on the Gaussian processes
branch and, if possible, help improve the code or the documentation, or
the examples.
Hopefully this would open the door to having Gaussian process (or
Kriging) available to the community in a standard package.
Cheers,
Gael
Hi,
PyMC contains code for Gaussian Processes under the MIT license
http://code.google.com/p/pymc/
They even have a GPUserGuide.pdf
I don't know anything about Gaussian processes, but I had seen them in PyMC and thought I'd mention it.
Christoph
> I don't know anything about Gaussian processes, but I had seen them in
> PyMC and thought I'd mention it.
Sorry, I should have said 'Gaussian process regression', which is the
full name, and is an equivalent to Kriging. Gaussian processes in
themself are a very large class of probabilistic models.
AFAICT, PyMC does not have any Gaussian process regression, and it does
seem a bit outside its scope.
Names, jargon, ... all this can be terribly confusing.
Thanks for your input,
Gael
2010/11/20 Gael Varoquaux <gael.va...@normalesup.org>:
--
Lionel Roubeyrie
lionel.r...@gmail.com
http://youarealegend.blogspot.com
I'm pretty sure it does. See section 1.4 "Nonparametric regression"
and 2.4 "Geostatistical example" in the GP User's Guide:
http://pymc.googlecode.com/files/GPUserGuide.pdf
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
> I'm pretty sure it does. See section 1.4 "Nonparametric regression"
> and 2.4 "Geostatistical example" in the GP User's Guide:
> http://pymc.googlecode.com/files/GPUserGuide.pdf
Yes, you are right. My bad. The good news is that it means that the name
is not too badly overloaded.
I see that they do the estimation by sampling the posterior, whereas the
proposed contribution in the scikit simply does a point estimate using
the scipy's optimizers. I guess that PyMC's approach gives a full
posterior estimate, and is thus richer than the point estimate, but I
would except it to be slower. I wonder if they are any other fundemental
differences (I don't know Gaussian processes terribly well).
Gael
Well, the posterior is always Gaussian, so point estimates with 1-SD
error bands characterize the posterior perfectly well! pymc.gp does
point estimates, too. See the Mean.observe() method. It used to live
as a separate package by another author before they decided to merge
it into PyMC.
But yes, kriging is a specialization of GP regression by another name.
The main distint features of kriging are that the covariance functions
usually take a particular form (a nonzero variance called the "nugget"
infinitesimally off of 0 and increasing smoothly to a limiting value
called the "sill" far from 0), and the covariance function is often
estimated from the data. Oh, and no one outside of geostatistics uses
the word "kriging". ;-)
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
OK. I new so much of the sampling work in PyMC that I didn't look well
enough at the code.
> But yes, kriging is a specialization of GP regression by another name.
> The main distint features of kriging are that the covariance functions
> usually take a particular form (a nonzero variance called the "nugget"
> infinitesimally off of 0 and increasing smoothly to a limiting value
> called the "sill" far from 0), and the covariance function is often
> estimated from the data. Oh, and no one outside of geostatistics uses
> the word "kriging". ;-)
Thanks for your precisions,
G
Well, that's a question of point of view. If you are trying to do a
package specific to geostatistics, than it may not make much sens.
However, I personnaly think that establishing barrier between fields with
different codes solving different variants of the same problem does not
help scientific and technical progress. On the other hand, it is clear
that people come in with different vocabularies and expectations, and
thus 'swiss-army-knife' codes may not do much good either.
We thought that a Gaussian process regression code could fit well in
scikit learn because it is a problem that is well identified by the
machine learning community and recieves on going research from this
community. As a result, such code can benefit from other algorithms
implemented in the scikit for instance to do sparse Gaussian process
regression, a technique which can make Gaussian process regression both
faster, and more stable on high-dimensional data.
> People who are looking for a package to interpolate data using kriging are
> going to expect to:
> a) specify which type of covariance function they're using from a number
> of commonly used ones,
> b) fit this function from the observed data,
> c) review the fit of this function and have manual control it function,
> d) have a covariance function that varies depending on azimuth (Or at
> least a way to test for the degree and direction of anisotropy in the
> observed data and use this when interpolating),
> d) use other related methods (such as co-kriging to incorporate multiple
> variables, or stochastic simulation using the same covariance functions,
> etc)
> e) have lots of control over the search window used when interpolating
> (which is a bit of a different topic)
Thanks a lot for the precisions, this is useful. I can see that to do
Kriging you are adding a set of assumptions to the Gaussian process
regression. Are you suggesting that it would be worth having separate
Kriging objects as sub classes of the GaussianProcess objects?
> I'm not trying to say that it's a bad thing to combine similar code, just
> be aware that the first thing that someone's going to think when they hear
> "kriging" is "How do I build and fit a variogram with this module?".
Thank you. I was certainly not aware (I am certainly not a Kriging nor a
Gaussian Process expert). I am no clue what a variogram is. It does seem
that any code that wants to cater for 'Kriging' users will need some
Kriging-specific functionality.
If people are (still) interested in the effort underway in the
scikit-learn[*], it might be great to contribute a Kriging-specific
module that uses the more general-purpose Gaussian process code to
achieve what geostatisticians call Kriging. If there is some
freely-downloadable geostatistics data, it would be great to make an
example (similar to the one in PyMC) that ensures that comon tasks in
geostatistics can easily be done.
As a side note, now that I am having a closer look at the PyMC GP
documentation, there seems to be some really nice and fancy code in
there, and it is very well documented.
Ga�l
>> I'm not trying to say that it's a bad thing to combine similar code, just
>> be aware that the first thing that someone's going to think when they hear
>> "kriging" is "How do I build and fit a variogram with this module?".
>
> Thank you. I was certainly not aware (I am certainly not a Kriging nor a
> Gaussian Process expert). I am no clue what a variogram is. It does seem
> that any code that wants to cater for 'Kriging' users will need some
> Kriging-specific functionality.
FWIW, a variogram is a different way of representing the covariance
function of a GP. It obscures the relationship kriging/GPs have with
multidimensional Gaussian distributions, but it arguably has a closer
relationship to observable or estimable quantities. Assuming isotropy
for the moment, it is a function of radius that describes the variance
of an r-distant point conditioned on knowing the value of the point at
r=0. That's where the "nugget" and "sill" values I described earlier
come from. Exactly at r=0, the variance is 0, naturally, but
infinitesimally close to 0, it takes the nonzero nugget value. The
nugget roughly represents the uncertainty of any individual
observation. The variance (usually) increases as the radius increases
up to a limiting value called the sill. This is the overall variance
in the data. The variogram can be estimated by looking at all of the
squared pairwise differences in the observed values plotted as a
function of the pairwise distances.
http://en.wikipedia.org/wiki/Variogram
Naturally, there is an extensive and well-developed literature using
this methodology, possibly more so than the GP regression formulation.
The geostatisticians were doing GPs before everyone else caught on.
:-)
> If people are (still) interested in the effort underway in the
> scikit-learn[*], it might be great to contribute a Kriging-specific
> module that uses the more general-purpose Gaussian process code to
> achieve what geostatisticians call Kriging. If there is some
> freely-downloadable geostatistics data, it would be great to make an
> example (similar to the one in PyMC) that ensures that comon tasks in
> geostatistics can easily be done.
>
> As a side note, now that I am having a closer look at the PyMC GP
> documentation, there seems to be some really nice and fancy code in
> there, and it is very well documented.
Yup! They've done some good work.
I was looking for a kridging module recently and came across this. I haven't tried it out yet but am getting ready to. It uses numpy arrays and also is able to read/write GSLib files. GSLib seems to be a fairly established command line library in the Geostats world.
- dharhas
2010/11/22 Dharhas Pothina <Dharhas...@twdb.state.tx.us>:
- dharhas
>>> Lionel Roubeyrie <lionel.r...@gmail.com> 11/22/2010 9:15 AM >>>
I'm the author PyMC's GP module. Sorry to come late to this thread.
The discussion of my module has been on target, and thanks very much
for the kind words... as everyone here knows it's nice when people
notice code that you've worked hard on. I have a couple of hopefully
relevant things to say about it.
First, the GP module is broader in scope than what people typically
mean by GP regression and kriging. The statistical model underlying
typical GPR/K says that the data are normally distributed with
expectations equal to the GP's value at particular, known locations.
Further, the mean and covariance parameters of the field, as well as
the variance of the data, are typically fixed before starting the
regression.
With the GP module, the mean and covariance parameters can be unknown,
and the data can depend on the field in any way; as a random example,
each data point could be Gamma distributed, with parameters determined
by a nonlinear transformation of the field's value at several unknown
locations.
That said, the module has a very pronounced fast path that restricts
its practical model space to Bayesian geostatistics, which means the
aforementioned locations have to be known before starting the
regression. This is still a superset of GPR/K. There are numerous
examples of the GP module in use for Bayesian geostatistics at
github.com/malaria-atlas-project.
Second, the parts of the GP module that would help with GPR/K are not
very tightly bound to either the rest of PyMC or the Bayesian
paradigm, and could be pulled out. These parts are the Mean,
Covariance and Realization objects, functions like observe and
point_predict, and their components; but not the GP submodels and step
methods mentioned in the user guide.
Any questions on the GP module are welcome at groups.google.com/p/
pymc. I'm looking forward to checking out the work in progress on the
scikit.
Cheers,
Anand
On Nov 23, 2:45 pm, "Dharhas Pothina"
<Dharhas.Poth...@twdb.state.tx.us> wrote:
> We were planning to project our irregular data onto a cartesian grid and try and use matplotlib to visualize the variograms. I don't think I know enough about the math ofkrigingto be of much help in the coding but I might be able to give your module a try if I can find time between deadlines.
>
> - dharhas
>
> >>> Lionel Roubeyrie <lionel.roubey...@gmail.com> 11/22/2010 9:15 AM >>>
>
> I have tried hpgl and had some discussions with one of the main
> developper, but hpgl works only on cartesian (regular) grid where I
> want to have the possibility to have predictions on irregular points
> and have the possibility to visualize variograms
>
> 2010/11/22 Dharhas Pothina <Dharhas.Poth...@twdb.state.tx.us>:
>
>
>
>
>
>
>
>
>
>
>
> > What about this package?http://hpgl.sourceforge.net/
>
> > I was looking for a kridging module recently and came across this. I haven't tried it out yet but am getting ready to. It uses numpy arrays and also is able to read/write GSLib files. GSLib seems to be a fairly established command line library in the Geostats world.
>
> > - dharhas
>
> > On Sat, Nov 20, 2010 at 12:56 PM, Lionel Roubeyrie <
> > lionel.roubey...@gmail.com> wrote:
>
> >> Hi all,
> >> I have written a simple module forkrigingcomputation (ordinary
> >>krigingfor the moment), it's not optimized and maybe some minors
> >> errors are inside but I think it delivers corrects results. Is there
> >> some people here that can help me for optimize the code or just to
> >> have a try? I don't know the politic of this mailing-list against
> >> joined files, so I don't send it here for now.
> >> Thanks
>
> >> --
> >> Lionel Roubeyrie
> >> lionel.roubey...@gmail.com
> >>http://youarealegend.blogspot.com
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-U...@scipy.org
> >>http://mail.scipy.org/mailman/listinfo/scipy-user
>
> > _______________________________________________
> > SciPy-User mailing list
> >http://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> Lionel Roubeyrie
> lionel.roubey...@gmail.comhttp://youarealegend.blogspot.com
> _______________________________________________
> SciPy-User mailing list
> SciPy-U...@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-U...@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
2010/11/24 Anand Patil <anand.prab...@gmail.com>:
--
Lionel Roubeyrie
lionel.r...@gmail.com
http://youarealegend.blogspot.com
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user