Creating a Mesh for Continental United States

699 views
Skip to first unread message

Mike Gahan

unread,
Aug 11, 2014, 10:39:11 PM8/11/14
to r-inla-disc...@googlegroups.com
I am currently working on a project that is attempting to create a risk map of auto-accidents in the
Continental United States.  I have 1 million records of car accidents with their longitudes, latitudes, as well 
as some other covariates.  I plan on eventually using many INLA procedures on this data, but for the time
being, my objective is to create a raster object that represents a continuous surface of risk (defined by the
probability of an accident).

Would it be overly ambitious to attempt to create this raster surface with all 1 million observations?
The process to create the mesh is taking quite a while (I admit that I do not understand the best
way to create this mesh).  Does anyone have suggestions on the best way to go about creating the mesh
object?  I have been making my way through the spde tutorial and I am a bit confused on the parameters
to this function.

Thanks so much for creating this wonderful package,
Mike

Elias T Krainski

unread,
Aug 12, 2014, 4:53:43 AM8/12/14
to r-inla-disc...@googlegroups.com
On 12/08/14 04:39, Mike Gahan wrote:
Would it be overly ambitious to attempt to create this raster surface with all 1 million observations?
The raster resolution is not related to the dimension of your data. You can estimate such surface in high or small resolution with few or larger number of observations. The main limitation is the number of basis functions, related to number of mesh triangles.
The process to create the mesh is taking quite a while (I admit that I do not understand the best
way to create this mesh).  Does anyone have suggestions on the best way to go about creating the mesh
object?  I have been making my way through the spde tutorial and I am a bit confused on the parameters
to this function.
The main argument is max.edge=c(a,b), where
  a = maximum length of the inner edges
  b = maximum length of the outer edges
So, you can have a mesh with few triangles when you set 'a' and 'b' bigger and you can have a mesh with more triangles setting they smaller values. An important argument is the 'cuttof', used to avoid small triangles.

For non-convex shaped domains, it is good to consider using the inla.nonconvex.hull() function before creating the mesh. Please consider the following examples:

require(maps)
us <- map("usa", fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(us$names, ":"), function(x) x[1])

require(sp)
prj <- CRS("+proj=longlat +datum=WGS84")

require(maptools)
sp <- map2SpatialPolygons(us, IDs=IDs, proj4string=prj)

### first way: using the polygon
require(INLA)
pl.bound <- inla.sp2segment(sp)

mesh0 <- inla.mesh.2d(boundary=pl.bound, max.edge=c(3,7), cutoff=1)

### second way: using the location points
## suppose you have these points:
require(splancs)
pts <- csr(sp@polygons[[3]]@Polygons[[1]]@coords, 1e5)

require(INLA)
pt.bond <- inla.nonconvex.hull(pts, 2, 2)
pt.bond2 <- inla.nonconvex.hull(pts, 1, 1, 100)

mesh1 <- inla.mesh.2d(boundary=pt.bond, max.edge=c(5,10), cut=1)
mesh2 <- inla.mesh.2d(boundary=pt.bond, max.edge=c(3,7), cut=1)
mesh3 <- inla.mesh.2d(boundary=pt.bond2, max.edge=c(2,4), cut=1, off=c(1e-5,4))

c(mesh0$n, mesh1$n, mesh2$n, mesh3$n)

par(mfrow=c(2,2), mar=c(0,0,1,0))
plot(mesh0, asp=1)
plot(mesh1, asp=1)
plot(mesh2, asp=1)
plot(mesh3, asp=1)

best,
Elias.

Thanks so much for creating this wonderful package,
Mike
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Finn Lindgren

unread,
Aug 12, 2014, 4:58:23 AM8/12/14
to Mike Gahan, r-inla-disc...@googlegroups.com
On 12 August 2014 03:39, Mike Gahan <michae...@gmail.com> wrote:
> Would it be overly ambitious to attempt to create this raster surface with
> all 1 million observations?

I guess that the 1 million observation locations are somewhat
clustered, and not themselves on a grid/raster?
What dimension would you ideally want for the raster? The
computational complexity is limited by that much more than by the
number of observations.

> The process to create the mesh is taking quite a while (I admit that I do
> not understand the best
> way to create this mesh).

If you feed all 1e6 locations into inla.mesh.create/2d, most of the
time will be spent in an initial filtering phase to deal with
clustering, by finding all points that are within the "cutoff"
distance of eachother. I just had a look at my code for that, and the
way it's currently written that's an order O(n^2) operation. I think
I can use existing code to bring that down to O(n), but that will not
solve your actual problem.

See Elias' reply for suggestions for how to build a more useful mesh.

Finn

Mike Gahan

unread,
Aug 12, 2014, 10:07:32 AM8/12/14
to r-inla-disc...@googlegroups.com, michae...@gmail.com
Thanks so much!  This is very helpful.  My data does have clusters in the big cities (obviously more population = more auto accidents), but there aren't too many areas of the USA where there is no data at all.  I am working on a rather large server with 1 TB ram and 64 cores available, so I am hoping to use INLA to take advantage of this.

Finn Lindgren

unread,
Sep 10, 2014, 5:06:04 AM9/10/14
to r-inla-disc...@googlegroups.com, michae...@gmail.com

On Tuesday, August 12, 2014 3:58:23 AM UTC-5, Finn Lindgren wrote:
If you feed all 1e6 locations into inla.mesh.create/2d, most of the
time will be spent in an initial filtering phase to deal with
clustering, by finding all points that are within the "cutoff"
distance of eachother.  I just had a look at my code for that, and the
way it's currently written that's an order O(n^2) operation.

The latest package build uses a new filtering algorithm that for most kinds of input
is closer to
  O( (m + n) log(m) ),
where m is the number of points in the filtering output, instead of
O(n^2), so using "cutoff" to filter large sets of input locations is now a lot faster;
a factor ~10 for the artificial test cases I've tried.

Finn

Mike Gahan

unread,
Sep 10, 2014, 4:54:12 PM9/10/14
to Finn Lindgren, r-inla-disc...@googlegroups.com
Excellent!  I used the above code on a sample of 500,000 points. Using 10 cores, the original procedure finished after about 12 hours. This speed up is much appreciated.

On another note, are there any video tutorials on INLA? I see from your website that you guys have short courses all the time, but they seem to be in Europe. I am in the USA. Do you guys ever record the short courses? Just curious. Is your team a part of this text that is being released in February 2015?  http://www.amazon.com/Spatial-Spatio-temporal-Bayesian-Models-INLA/dp/1118326555/ref=sr_1_1?ie=UTF8&qid=1410382328&sr=8-1&keywords=r+inla

Thanks for all of your work with this wonderful package. -Mike

Finn Lindgren

unread,
Sep 10, 2014, 5:09:45 PM9/10/14
to Mike Gahan, r-inla-disc...@googlegroups.com
On 10/09/14 21:54, Mike Gahan wrote:
> Excellent! I used the above code on a sample of 500,000 points. Using
> 10 cores, the original procedure finished after about 12 hours. This
> speed up is much appreciated.

For 500,000 random locations and default cutoff (1e-12) generating a
mesh (without refinement) on a quad-core laptop takes ~75sec with the
new code

> loc=matrix(runif(2*5e5),5e5,2)
> system.time({newmesh=inla.mesh.create(loc=loc)})
user system elapsed
73.096 3.600 76.590

refinement would add a bit to that, but using a more sensible 'cutoff'
would reduce it, for example:

> system.time({newmesh=inla.mesh.create(loc=loc,cutoff=0.01)})
user system elapsed
2.184 0.036 2.233

Let me know if your new timings are significantly different from this;
there may be special cases that bring out the worst case behaviour of
the algorithm in ways I haven't thought of yet.

> On another note, are there any video tutorials on INLA? I see from your
> website that you guys have short courses all the time, but they seem to
> be in Europe. I am in the USA. Do you guys ever record the short
> courses?

A few courses have been in the US, but yes, most of them are in Europe.
Videos have been discussed...

> Just curious. Is your team a part of this text that is being
> released in February 2015?
> http://www.amazon.com/Spatial-Spatio-temporal-Bayesian-Models-INLA/dp/1118326555/ref=sr_1_1?ie=UTF8&qid=1410382328&sr=8-1&keywords=r+inla

Yes, we're working with them.

> Thanks for all of your work with this wonderful package. -Mike

You're welcome!

Finn

Mike Gahan

unread,
Sep 10, 2014, 7:31:09 PM9/10/14
to Finn Lindgren, r-inla-disc...@googlegroups.com
Is the update only in the testing version?

Finn Lindgren

unread,
Sep 11, 2014, 1:36:31 AM9/11/14
to Mike Gahan, r-inla-disc...@googlegroups.com

> On 11 Sep 2014, at 00:31, Mike Gahan <michae...@gmail.com> wrote:
>
> Is the update only in the testing version?

Yes.

Finn
Reply all
Reply to author
Forward
0 new messages