GSoC Team Taylor - SpInt - Some Initial Issues

26 views
Skip to first unread message

tayo...@gmail.com

unread,
May 17, 2016, 1:33:51 PM5/17/16
to pysal-dev
Hi all,

I incorrectly share this email with some of the mentors on my GSoC project rather than on this list serve. We have started discussing these issues but I wanted to share this here and would welcome any feedback!

------------------------------------------------------------------------------------------------------------------------------------------------

Before I start writing code for the my GSoC project, I thought it might be good to outline some of the issues that I have been thinking about that I think will be key to the project at large. 

1) API design 

The API would need to accommodate a basic gravity model and then also support extensions that include SAR terms, a spatial filter term, or a competing destinations term. Typically the competing destinations term is nothing more than a sum of distance weighted attributes, which should be simple. I think it shouldn't be too hard use the existing spatial weights module as a base to create a function to create OD-based SAR spatial weights. And there is some code available for computing spatial filters in python (I also have some code that I wrote myself for a paper). So I think in order to accommodate the three extensions it might be simplest to have the user first compute these and then pass them in  as optional arguments to the basic gravity model. In the case of the competing destinations or spatial filter model, there is no new estimator required and so its as easy as including the extra variable (within an OLS or Poisson regression framework). If the optional SAR term is passed in then under the hood a new estimator could be used rather than the default OLS or  Poisson estimator. I imagine the basic call could look something like 

spint.gravity(required_arguments, cd=None, sf=None, w=None)

where cd would be an optional competing destinations variable, sf the optional spatial filter term, and w would be optional spatial weight, each of which would be pre-computed using code from the weights module or other utility functions.

The other alternative could be to have separate calls for each extension where the corresponding optional term now becomes required:

spint.gravity(required_arguments)
spint.gravity.cd(required_arguments)
spint.gravity.sf(required_arguments)
spint.gravity.lag(required_arguments)

I am inclined to think the former is more elegant, but perhaps for a user the later is simpler to understand?

Another layer to the API would be each "member" to the Wilson-type "family" of models: unconstrained or basic gravity model, production or origin constrained, attraction or destination constrained, and doubly constrained. This is achieved technically by adding in either balancing factors or fixed effects during estimation, depending on the estimation technique. Then building from the first two examples the API could look like either:

(a)

spint.gravity.unconstrained(required_arguments, cd=None, sf=None, w=None)
spint.gravity.production(required_arguments, cd=None, sf=None, w=None)
spint.gravity.attraction(required_arguments, cd=None, sf=None, w=None)
spint.gravity.doubly(required_arguments, cd=None, sf=None, w=None)

(b)

spint.gravity(required_arguments, constraint=none)
spint.gravity.cd(required_arguments, constraint=none)
spint.gravity.sf(required_arguments, constraint=none)
spint.gravity.lag(required_arguments, constraint=none)

I am not sure there are any clear advantages of one over the either. The only thing I can think of is that option (a) may be more intuitive because the doubly-constrained model is unique in that it requires a squared OD matrix (all origins are also destinations) and so it might be natural since each of the four varieties may have different input properties that need to be checked. 

In terms of model fitting techniques, I think it would be neat to use something similar to stats models like:

model = spint.gravity.unconstrained(required_arguments, cd=None, sf=None, w=None)
results = model.fit(fit_arguments)
 
which would be natural if everything is being built onto of a GLM framework. The reasoning behind using a GLM framework was to be able take advantage of additional count models such as negative binomial relatively easily. "fit_arguments" could include the type of probability model (gaussian, poisson, negative binomial, etc.), and the fit technique (iteratively re-weighted least squares or Theano/Autograd MLE).

Perhaps it might be better to stick closer to the spreg API? 

Other models that would be incorporated, but that I sort of think of as separate from those described above are non-parametric methods, which are either deterministic "universal" models that mostly come out of the human mobility literature and the neural network spatial interaction models. I think it would make sense to accommodate "universal" models so that they are separate from gravity models. For example:

spint.mobility.radiaton()
spint.mobility.inv_pop_weighted()
spint.neural()


2) API input

Should the API input be all arrays or should it be strings that refer to pandas columns or potentially be flexible enough to accommodate either? 

3) GLM framework base

Should statsmodels be used directly as a base or should I write my own that is influenced by it? I already starting writing my own base that extremely similar. My thinking is that it would be helpful to build my own based on theirs in order to keep it simpler, whereby I could just grab pieces as needed. Maybe this is short-sighted? 

4) Poisson SAR (lag) estimator

Does anyone have literature recommendations on this? I have some papers I found that I am still reading through, but I haven't found much on it. Its one of the substantive areas where I may know the least within my proposal, so my goal is to remedy that asap.


Cheers,
Taylor
Reply all
Reply to author
Forward
0 new messages