[SciPy-User] kriging module

721 views
Skip to first unread message

Lionel Roubeyrie

unread,
Nov 20, 2010, 1:56:11 PM11/20/10
to scipy...@scipy.org
Hi all,
I have written a simple module for kriging computation (ordinary
kriging for the moment), it's not optimized and maybe some minors
errors are inside but I think it delivers corrects results. Is there
some people here that can help me for optimize the code or just to
have a try? I don't know the politic of this mailing-list against
joined files, so I don't send it here for now.
Thanks

--
Lionel Roubeyrie
lionel.r...@gmail.com
http://youarealegend.blogspot.com
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Joe Kington

unread,
Nov 20, 2010, 2:34:22 PM11/20/10
to SciPy Users List
I'm fairly familiar with geostats, so I'd be glad to help!  I've actually written kriging and cokirging modules in python (that I never cleaned up and generalized enough to share, unfortunately).

Rather than just attaching the files, why don't you put the code on github/bitbucket/google code/your-repository-of-choice?  (Or even pastebin, if it's short, for that matter.) 

Massimo Di Stefano

unread,
Nov 20, 2010, 3:42:47 PM11/20/10
to SciPy Users List
Hello All,

i'm a student in geoscience and just few weeks ago i had to learn how kringing works
to apply kriging interpolation on  geospatial dataset.

sounds really cool to have  kriging ability in python using directly scipy and numpy,
i will be very happy to help to test it on a common dtaset and compare the results with other tools (like the kriging modules in R)

thanks!!!

Massimo.

Gael Varoquaux

unread,
Nov 20, 2010, 3:50:58 PM11/20/10
to SciPy Users List
On Sat, Nov 20, 2010 at 07:56:11PM +0100, Lionel Roubeyrie wrote:
> I have written a simple module for kriging computation (ordinary
> kriging for the moment), it's not optimized and maybe some minors
> errors are inside but I think it delivers corrects results. Is there
> some people here that can help me for optimize the code or just to
> have a try? I don't know the politic of this mailing-list against
> joined files, so I don't send it here for now.

Hey,

For the last few weeks, there has been an ungoing effort to develop
Gaussian Processes (which are another name for Kriging) in the
scikit-learn (http://scikit-learn.sourceforge.net/). The initial code has
recieved a lot of comments and subsequent work:
https://github.com/scikit-learn/scikit-learn/pull/14

The scikit-learn is current a very active project, with many contributors
that share a good expertise on machine learning and computational
statistics issue, a high standard for code, and frequent releases. It
would be really great if people who have usage or knowledge (or both) of
Gaussian processes could join in the discussion on the Gaussian processes
branch and, if possible, help improve the code or the documentation, or
the examples.

Hopefully this would open the door to having Gaussian process (or
Kriging) available to the community in a standard package.

Cheers,

Gael

Christoph Deil

unread,
Nov 20, 2010, 5:40:03 PM11/20/10
to SciPy Users List

Hi,

PyMC contains code for Gaussian Processes under the MIT license
http://code.google.com/p/pymc/
They even have a GPUserGuide.pdf

I don't know anything about Gaussian processes, but I had seen them in PyMC and thought I'd mention it.

Christoph

Gael Varoquaux

unread,
Nov 20, 2010, 5:44:40 PM11/20/10
to SciPy Users List
On Sat, Nov 20, 2010 at 11:40:03PM +0100, Christoph Deil wrote:
> > Hopefully this would open the door to having Gaussian process (or
> > Kriging) available to the community in a standard package.

> I don't know anything about Gaussian processes, but I had seen them in


> PyMC and thought I'd mention it.

Sorry, I should have said 'Gaussian process regression', which is the
full name, and is an equivalent to Kriging. Gaussian processes in
themself are a very large class of probabilistic models.
AFAICT, PyMC does not have any Gaussian process regression, and it does
seem a bit outside its scope.

Names, jargon, ... all this can be terribly confusing.

Thanks for your input,

Gael

Lionel Roubeyrie

unread,
Nov 20, 2010, 5:48:51 PM11/20/10
to SciPy Users List
Ok,
so I put the code under github here :
g...@github.com:LionelR/kriging-module.git
Thanks for any feedback and Gael for the links, I will give a try.

2010/11/20 Gael Varoquaux <gael.va...@normalesup.org>:

--

Robert Kern

unread,
Nov 20, 2010, 5:59:41 PM11/20/10
to SciPy Users List
On Sat, Nov 20, 2010 at 16:44, Gael Varoquaux
<gael.va...@normalesup.org> wrote:
> On Sat, Nov 20, 2010 at 11:40:03PM +0100, Christoph Deil wrote:
>> > Hopefully this would open the door to having Gaussian process (or
>> > Kriging) available to the community in a standard package.
>
>> I don't know anything about Gaussian processes, but I had seen them in
>> PyMC and thought I'd mention it.
>
> Sorry, I should have said 'Gaussian process regression', which is the
> full name, and is an equivalent to Kriging. Gaussian processes in
> themself are a very large class of probabilistic models.
> AFAICT, PyMC does not have any Gaussian process regression, and it does
> seem a bit outside its scope.

I'm pretty sure it does. See section 1.4 "Nonparametric regression"
and 2.4 "Geostatistical example" in the GP User's Guide:

http://pymc.googlecode.com/files/GPUserGuide.pdf

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

Gael Varoquaux

unread,
Nov 20, 2010, 6:08:56 PM11/20/10
to SciPy Users List, scikit-lea...@lists.sourceforge.net
On Sat, Nov 20, 2010 at 04:59:41PM -0600, Robert Kern wrote:
> > Sorry, I should have said 'Gaussian process regression', which is the
> > full name, and is an equivalent to Kriging. Gaussian processes in
> > themself are a very large class of probabilistic models.
> > AFAICT, PyMC does not have any Gaussian process regression, and it does
> > seem a bit outside its scope.

> I'm pretty sure it does. See section 1.4 "Nonparametric regression"
> and 2.4 "Geostatistical example" in the GP User's Guide:

> http://pymc.googlecode.com/files/GPUserGuide.pdf

Yes, you are right. My bad. The good news is that it means that the name
is not too badly overloaded.

I see that they do the estimation by sampling the posterior, whereas the
proposed contribution in the scikit simply does a point estimate using
the scipy's optimizers. I guess that PyMC's approach gives a full
posterior estimate, and is thus richer than the point estimate, but I
would except it to be slower. I wonder if they are any other fundemental
differences (I don't know Gaussian processes terribly well).

Gael

Robert Kern

unread,
Nov 20, 2010, 6:35:53 PM11/20/10
to SciPy Users List, scikit-lea...@lists.sourceforge.net
On Sat, Nov 20, 2010 at 17:08, Gael Varoquaux
<gael.va...@normalesup.org> wrote:
> On Sat, Nov 20, 2010 at 04:59:41PM -0600, Robert Kern wrote:
>> > Sorry, I should have said 'Gaussian process regression', which is the
>> > full name, and is an equivalent to Kriging. Gaussian processes in
>> > themself are a very large class of probabilistic models.
>> > AFAICT, PyMC does not have any Gaussian process regression, and it does
>> > seem a bit outside its scope.
>
>> I'm pretty sure it does. See section 1.4 "Nonparametric regression"
>> and 2.4 "Geostatistical example" in the GP User's Guide:
>
>>   http://pymc.googlecode.com/files/GPUserGuide.pdf
>
> Yes, you are right. My bad. The good news is that it means that the name
> is not too badly overloaded.
>
> I see that they do the estimation by sampling the posterior, whereas the
> proposed contribution in the scikit simply does a point estimate using
> the scipy's optimizers. I guess that PyMC's approach gives a full
> posterior estimate, and is thus richer than the point estimate, but I
> would except it to be slower. I wonder if they are any other fundemental
> differences (I don't know Gaussian processes terribly well).

Well, the posterior is always Gaussian, so point estimates with 1-SD
error bands characterize the posterior perfectly well! pymc.gp does
point estimates, too. See the Mean.observe() method. It used to live
as a separate package by another author before they decided to merge
it into PyMC.

But yes, kriging is a specialization of GP regression by another name.
The main distint features of kriging are that the covariance functions
usually take a particular form (a nonzero variance called the "nugget"
infinitesimally off of 0 and increasing smoothly to a limiting value
called the "sill" far from 0), and the covariance function is often
estimated from the data. Oh, and no one outside of geostatistics uses
the word "kriging". ;-)

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

Joe Kington

unread,
Nov 20, 2010, 11:28:15 PM11/20/10
to SciPy Users List

Not to be argumentative, but this is why it may not make a ton of sense to wrap "kriging" into a module that implements more general Gaussian process regression methods.

People who are looking for a package to interpolate data using kriging are going to expect to:
a) specify which type of covariance function they're using from a number of commonly used ones,
b) fit this function from the observed data,
c) review the fit of this function and have manual control it function,
d) have a covariance function that varies depending on azimuth (Or at least a way to test for the degree and direction of anisotropy in the observed data and use this when interpolating),
d) use other related methods (such as co-kriging to incorporate multiple variables, or stochastic simulation using the same covariance functions, etc)
e) have lots of control over the search window used when interpolating (which is a bit of a different topic)

From a practical standpoint, the only reason to use kriging as an interpolation method is so that you can incorporate lots of a-priori information.  No one should ever interpolate any data unless they know what the result _should_ look like. The various "kriging" methods essentially just give a lot of "knobs to tweak", so that you can build an interpolation method that produces results that behaves in a certain way for a certain case.  It's all about incorporating a-priori information.  Otherwise, just use a radial basis function, or some other smooth interpolator.

I'm not trying to say that it's a bad thing to combine similar code, just be aware that the first thing that someone's going to think when they hear "kriging" is "How do I build and fit a variogram with this module?".

-Joe


Gael Varoquaux

unread,
Nov 21, 2010, 3:43:35 AM11/21/10
to SciPy Users List, scikit-lea...@lists.sourceforge.net
On Sat, Nov 20, 2010 at 05:35:53PM -0600, Robert Kern wrote:
> Well, the posterior is always Gaussian, so point estimates with 1-SD
> error bands characterize the posterior perfectly well! pymc.gp does
> point estimates, too. See the Mean.observe() method. It used to live
> as a separate package by another author before they decided to merge
> it into PyMC.

OK. I new so much of the sampling work in PyMC that I didn't look well
enough at the code.

> But yes, kriging is a specialization of GP regression by another name.
> The main distint features of kriging are that the covariance functions
> usually take a particular form (a nonzero variance called the "nugget"
> infinitesimally off of 0 and increasing smoothly to a limiting value
> called the "sill" far from 0), and the covariance function is often
> estimated from the data. Oh, and no one outside of geostatistics uses
> the word "kriging". ;-)

Thanks for your precisions,

G

Gael Varoquaux

unread,
Nov 21, 2010, 4:07:18 AM11/21/10
to SciPy Users List, scikit-lea...@lists.sourceforge.net
On Sat, Nov 20, 2010 at 10:28:15PM -0600, Joe Kington wrote:
> Not to be argumentative, but this is why it may not make a ton of sense to
> wrap "kriging" into a module that implements more general Gaussian process
> regression methods.

Well, that's a question of point of view. If you are trying to do a
package specific to geostatistics, than it may not make much sens.
However, I personnaly think that establishing barrier between fields with
different codes solving different variants of the same problem does not
help scientific and technical progress. On the other hand, it is clear
that people come in with different vocabularies and expectations, and
thus 'swiss-army-knife' codes may not do much good either.

We thought that a Gaussian process regression code could fit well in
scikit learn because it is a problem that is well identified by the
machine learning community and recieves on going research from this
community. As a result, such code can benefit from other algorithms
implemented in the scikit for instance to do sparse Gaussian process
regression, a technique which can make Gaussian process regression both
faster, and more stable on high-dimensional data.

> People who are looking for a package to interpolate data using kriging are
> going to expect to:
> a) specify which type of covariance function they're using from a number
> of commonly used ones,
> b) fit this function from the observed data,
> c) review the fit of this function and have manual control it function,
> d) have a covariance function that varies depending on azimuth (Or at
> least a way to test for the degree and direction of anisotropy in the
> observed data and use this when interpolating),
> d) use other related methods (such as co-kriging to incorporate multiple
> variables, or stochastic simulation using the same covariance functions,
> etc)
> e) have lots of control over the search window used when interpolating
> (which is a bit of a different topic)

Thanks a lot for the precisions, this is useful. I can see that to do
Kriging you are adding a set of assumptions to the Gaussian process
regression. Are you suggesting that it would be worth having separate
Kriging objects as sub classes of the GaussianProcess objects?

> I'm not trying to say that it's a bad thing to combine similar code, just
> be aware that the first thing that someone's going to think when they hear
> "kriging" is "How do I build and fit a variogram with this module?".

Thank you. I was certainly not aware (I am certainly not a Kriging nor a
Gaussian Process expert). I am no clue what a variogram is. It does seem
that any code that wants to cater for 'Kriging' users will need some
Kriging-specific functionality.

If people are (still) interested in the effort underway in the
scikit-learn[*], it might be great to contribute a Kriging-specific
module that uses the more general-purpose Gaussian process code to
achieve what geostatisticians call Kriging. If there is some
freely-downloadable geostatistics data, it would be great to make an
example (similar to the one in PyMC) that ensures that comon tasks in
geostatistics can easily be done.

As a side note, now that I am having a closer look at the PyMC GP
documentation, there seems to be some really nice and fancy code in
there, and it is very well documented.

Ga�l

[*] https://github.com/scikit-learn/scikit-learn/pull/14

Robert Kern

unread,
Nov 21, 2010, 7:38:20 PM11/21/10
to scikit-lea...@lists.sourceforge.net, SciPy Users List
On Sun, Nov 21, 2010 at 03:07, Gael Varoquaux
<gael.va...@normalesup.org> wrote:
> On Sat, Nov 20, 2010 at 10:28:15PM -0600, Joe Kington wrote:

>>    I'm not trying to say that it's a bad thing to combine similar code, just
>>    be aware that the first thing that someone's going to think when they hear
>>    "kriging" is "How do I build and fit a variogram with this module?".
>
> Thank you. I was certainly not aware (I am certainly not a Kriging nor a
> Gaussian Process expert). I am no clue what a variogram is. It does seem
> that any code that wants to cater for 'Kriging' users will need some
> Kriging-specific functionality.

FWIW, a variogram is a different way of representing the covariance
function of a GP. It obscures the relationship kriging/GPs have with
multidimensional Gaussian distributions, but it arguably has a closer
relationship to observable or estimable quantities. Assuming isotropy
for the moment, it is a function of radius that describes the variance
of an r-distant point conditioned on knowing the value of the point at
r=0. That's where the "nugget" and "sill" values I described earlier
come from. Exactly at r=0, the variance is 0, naturally, but
infinitesimally close to 0, it takes the nonzero nugget value. The
nugget roughly represents the uncertainty of any individual
observation. The variance (usually) increases as the radius increases
up to a limiting value called the sill. This is the overall variance
in the data. The variogram can be estimated by looking at all of the
squared pairwise differences in the observed values plotted as a
function of the pairwise distances.

http://en.wikipedia.org/wiki/Variogram

Naturally, there is an extensive and well-developed literature using
this methodology, possibly more so than the GP regression formulation.
The geostatisticians were doing GPs before everyone else caught on.
:-)

> If people are (still) interested in the effort underway in the
> scikit-learn[*], it might be great to contribute a Kriging-specific
> module that uses the more general-purpose Gaussian process code to
> achieve what geostatisticians call Kriging. If there is some
> freely-downloadable geostatistics data, it would be great to make an
> example (similar to the one in PyMC) that ensures that comon tasks in
> geostatistics can easily be done.
>
> As a side note, now that I am having a closer look at the PyMC GP
> documentation, there seems to be some really nice and fancy code in
> there, and it is very well documented.

Yup! They've done some good work.

Dharhas Pothina

unread,
Nov 22, 2010, 9:17:53 AM11/22/10
to SciPy Users List

What about this package? http://hpgl.sourceforge.net/

I was looking for a kridging module recently and came across this. I haven't tried it out yet but am getting ready to. It uses numpy arrays and also is able to read/write GSLib files. GSLib seems to be a fairly established command line library in the Geostats world.

- dharhas

Lionel Roubeyrie

unread,
Nov 22, 2010, 10:15:05 AM11/22/10
to SciPy Users List
I have tried hpgl and had some discussions with one of the main
developper, but hpgl works only on cartesian (regular) grid where I
want to have the possibility to have predictions on irregular points
and have the possibility to visualize variograms

2010/11/22 Dharhas Pothina <Dharhas...@twdb.state.tx.us>:

Dharhas Pothina

unread,
Nov 23, 2010, 9:45:35 AM11/23/10
to SciPy Users List

We were planning to project our irregular data onto a cartesian grid and try and use matplotlib to visualize the variograms. I don't think I know enough about the math of kriging to be of much help in the coding but I might be able to give your module a try if I can find time between deadlines.

- dharhas

>>> Lionel Roubeyrie <lionel.r...@gmail.com> 11/22/2010 9:15 AM >>>

Anand Patil

unread,
Nov 24, 2010, 11:13:07 AM11/24/10
to scipy...@scipy.org
Hi everyone,


I'm the author PyMC's GP module. Sorry to come late to this thread.
The discussion of my module has been on target, and thanks very much
for the kind words... as everyone here knows it's nice when people
notice code that you've worked hard on. I have a couple of hopefully
relevant things to say about it.


First, the GP module is broader in scope than what people typically
mean by GP regression and kriging. The statistical model underlying
typical GPR/K says that the data are normally distributed with
expectations equal to the GP's value at particular, known locations.
Further, the mean and covariance parameters of the field, as well as
the variance of the data, are typically fixed before starting the
regression.

With the GP module, the mean and covariance parameters can be unknown,
and the data can depend on the field in any way; as a random example,
each data point could be Gamma distributed, with parameters determined
by a nonlinear transformation of the field's value at several unknown
locations.

That said, the module has a very pronounced fast path that restricts
its practical model space to Bayesian geostatistics, which means the
aforementioned locations have to be known before starting the
regression. This is still a superset of GPR/K. There are numerous
examples of the GP module in use for Bayesian geostatistics at
github.com/malaria-atlas-project.


Second, the parts of the GP module that would help with GPR/K are not
very tightly bound to either the rest of PyMC or the Bayesian
paradigm, and could be pulled out. These parts are the Mean,
Covariance and Realization objects, functions like observe and
point_predict, and their components; but not the GP submodels and step
methods mentioned in the user guide.


Any questions on the GP module are welcome at groups.google.com/p/
pymc. I'm looking forward to checking out the work in progress on the
scikit.

Cheers,
Anand

On Nov 23, 2:45 pm, "Dharhas Pothina"
<Dharhas.Poth...@twdb.state.tx.us> wrote:
> We were planning to project our irregular data onto a cartesian grid and try and use matplotlib to visualize the variograms. I don't think I know enough about the math ofkrigingto be of much help in the coding but I might be able to give your module a try if I can find time between deadlines.
>
> - dharhas
>
> >>> Lionel Roubeyrie <lionel.roubey...@gmail.com> 11/22/2010 9:15 AM >>>


>
> I have tried hpgl and had some discussions with one of the main
> developper, but hpgl works only on cartesian (regular) grid where I
> want to have the possibility to have predictions on irregular points
> and have the possibility to visualize variograms
>

> 2010/11/22 Dharhas Pothina <Dharhas.Poth...@twdb.state.tx.us>:
>
>
>
>
>
>
>
>
>
>
>
> > What about this package?http://hpgl.sourceforge.net/


>
> > I was looking for a kridging module recently and came across this. I haven't tried it out yet but am getting ready to. It uses numpy arrays and also is able to read/write GSLib files. GSLib seems to be a fairly established command line library in the Geostats world.
>
> > - dharhas
>
> > On Sat, Nov 20, 2010 at 12:56 PM, Lionel Roubeyrie <
> > lionel.roubey...@gmail.com> wrote:
>
> >> Hi all,
> >> I have written a simple module forkrigingcomputation (ordinary

> >>krigingfor the moment), it's not optimized and maybe some minors


> >> errors are inside but I think it delivers corrects results. Is there
> >> some people here that can help me for optimize the code or just to
> >> have a try? I don't know the politic of this mailing-list against
> >> joined files, so I don't send it here for now.
> >> Thanks
>
> >> --
> >> Lionel Roubeyrie

> >> lionel.roubey...@gmail.com


> >>http://youarealegend.blogspot.com
> >> _______________________________________________
> >> SciPy-User mailing list

> >> SciPy-U...@scipy.org


> >>http://mail.scipy.org/mailman/listinfo/scipy-user
>
> > _______________________________________________
> > SciPy-User mailing list

> > SciPy-U...@scipy.org

> lionel.roubey...@gmail.comhttp://youarealegend.blogspot.com
> _______________________________________________
> SciPy-User mailing list
> SciPy-U...@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-U...@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user

Lionel Roubeyrie

unread,
Dec 6, 2010, 9:22:22 AM12/6/10
to SciPy Users List
Hi all,
no really familiar with Git, I just see I have given a bad address...
So, if anyone want to test, here is the good one :
g...@github.com:LionelR/krige.git
I'll really appreciate any comment, thanks

2010/11/24 Anand Patil <anand.prab...@gmail.com>:

--
Lionel Roubeyrie
lionel.r...@gmail.com


http://youarealegend.blogspot.com
_______________________________________________
SciPy-User mailing list

SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Reply all
Reply to author
Forward
0 new messages