Workflow with Julia-Geo ?

250 views
Skip to first unread message

Philippe Roy

unread,
Oct 7, 2016, 9:18:43 AM10/7/16
to julia-geo
Hello!

I'm trying to code my new project in Julia and I find it difficult right now. Any help would be appreciated!

Here's my workflow :

1- Define a Shapefile (usually a polygon)
2- Loop over tons of NetCDF files ( > 300 files that are about 2 GB each).
3- Into this loop, I'd like to extract only the grid points that are located into the Shapefile. Extracting the whole spatial domain of the NetCDF file is a no-go. I have the RAM (128 GB), but it's too slow since I'm looking to analyze about 200 climate simulations (Global Models from CMIP5).
4- Once I extracted the timeseries for the relevant grid-point --> Interpolation on a grid
5- Post-treatment of these timeseries using Observations
6- Statistics of the post-treated timeseries

Now, my problem is mainly located in steps 2 and 3. I still haven't found how to extract a subset of the spatial domain of the NetCDF file using the "NetCDF.jl" package.

Any help regarding this matter would be greatly appreciated! Thanks!

Yeesian Ng

unread,
Oct 7, 2016, 11:12:05 AM10/7/16
to julia-geo
I don't know how to achieve it using the NetCDF bindings yet. Assuming you have GDAL and ArchGDAL set up, it might be something like

AG.read("ospy/data4/aster.img") do ds
    AG.read("ospy/data4/sites.shp") do shp
        shplayer = AG.getlayer(shp, 0)
        id = AG.getfieldindex(AG.getlayerdefn(shplayer), "ID")

        transform = AG.getgeotransform(ds)
        xOrigin = transform[1]; yOrigin = transform[4]
        pixelWidth = transform[2]; pixelHeight = transform[6]

        for feature in shplayer
            geom = AG.getgeom(feature)
            x = AG.getx(geom, 0); y = AG.gety(geom, 0)
            # compute pixel offset
            xOffset = round(Int, (x - xOrigin) / pixelWidth)
            yOffset = round(Int, (y - yOrigin) / pixelHeight)
            # create a string to print out
            s = "$(AG.getfield(feature, id)) "
            for j in 1:AG.nraster(ds)
                data = AG.read(ds, j, xOffset, yOffset, 1, 1)
                s = "$s $(data[1,1]) "
            end
            println(s)
        end
    end
end

(taken from https://github.com/yeesian/ArchGDAL.jl/blob/master/test/test_ospy_examples.jl#L200-L222)

You might find alternatives in python or R easier to work with for now, e.g.

Philippe Roy

unread,
Oct 7, 2016, 1:11:13 PM10/7/16
to julia-geo
I've succeeded with this workaround. I have not implemented it with a shapefile though. For the moment, I can only use a box.

Is there any way to make this code in parallel? For now, it's only for 1 NetCDF file, but I will need to do it for 300 files. I was hoping to do something in parallel somewhere.

# LAT-LONG BOX
LatMax = 55
LatMin = 35
LonMin = 280
LonMax = 310

# GET INFO ON THE NetCDF FILE
ncI = ncinfo(fileName)
Lat = ncread(fileName, "lat")
Lon = ncread(fileName, "lon")
Time = ncread(fileName, "time")

# FIND NUMBER OF GRID POINTS INSIDE THE LAT-LONG BOX
z = 1
for iLat in Lat
  for iLon in Lon
    if (iLat >= LatMin && iLat <= LatMax) && (iLon >= LonMin && iLon <= LonMax)
      #idxLat = findin(Lat, iLat)
      #idxLon = findin(Lon, iLon)
      #print([iLat iLon], "\n")
      #Pr[z,:] = ncread(fileName, "pr", start=[idxLon[1], idxLat[1], 1], count=[1, 1, -1])
      z += 1
    end
  end
end

# SHOW NUMBER OF GRID POINT
print("Grid points : ", z)

# DECLARE OUTPUT MATRIX
Pr = Array(Float32, z - 1, length(Time))
z = 1
for iLat in Lat
  for iLon in Lon
    if (iLat >= LatMin && iLat <= LatMax) && (iLon >= LonMin && iLon <= LonMax)
      idxLat = findin(Lat, iLat)
      idxLon = findin(Lon, iLon)
      Pr[z,:] = ncread(fileName, "pr", start=[idxLon[1], idxLat[1], 1], count=[1, 1, -1]) * 86400
      z += 1
    end
  end
end

 

Fabian Gans

unread,
Oct 7, 2016, 4:25:49 PM10/7/16
to julia-geo
If you are reading a box, I think it would be much faster to call ncread only once for the whole box. So does something like this work? (Not tested)

# LAT-LONG BOX
LatMax = 55
LatMin = 35
LonMin = 280
LonMax = 310

# GET INFO ON THE NetCDF FILE
ncI = ncinfo(fileName)
Lat = ncread(fileName, "lat")
Lon = ncread(fileName, "lon")
Time = ncread(fileName, "time")

idxlats=find(iLat->(iLat >= LatMin && iLat <= LatMax), Lat) 
idxlons=find(iLon->(iLon >=LonMin && iLon <= LonMax), Lon)

lat1=idxlats[1]
nlat=length(idxlats)

lon1=idxlons[1]
nlon=length(idxlons)

z=ncread(fileName,"pr",start=[lon1,lat1,1],count=[nlon,nlat,-1])

z2=reshape(z,(nlon*nlat,length(Time)))

Otherwise, you could also try the currently undocumented array-like access. When you do

v = NetCDF.open(filename,varname);

it will return an AbstractArray (not read into memory!), that you can index like:

z=v[idxlon1:idxlon2,idxlat1:idxlat2,:]

which will trigger reading the file. This would also allow random access of single grid cells. The only thing that the NetCDF package is not doing for you
is the transformation (lon,lat)->(idxLon,idxLat), but once you know your coordinate system, you can write a function that does this for you. 

Hope this was helpful. 

Fabian

Philippe Roy

unread,
Oct 8, 2016, 12:16:08 PM10/8/16
to julia-geo
Thanks!

I'll try tuesday, when I'll be back at work.

Philippe Roy

unread,
Oct 11, 2016, 10:12:54 AM10/11/16
to julia-geo
Both method works!

I do like the second one. With it, I can extract non-sequential points. For example, extracting index [2, 5, 30] for latitude and [20:30] for longitude.

Now, I'm looking for a function similar to "inpolygon" from Matlab. So, I could define "weird" polygon and only extract those indexes (which would probably be non-sequential). Perhaps with "Shapefile.jl" ?


Le vendredi 7 octobre 2016 16:25:49 UTC-4, Fabian Gans a écrit :

Mauro

unread,
Oct 11, 2016, 1:05:47 PM10/11/16
to juli...@googlegroups.com
On Tue, 2016-10-11 at 16:12, 'Philippe Roy' via julia-geo <juli...@googlegroups.com> wrote:
> Both method works!
>
> I do like the second one. With it, I can extract non-sequential points. For
> example, extracting index [2, 5, 30] for latitude and [20:30] for
> longitude.
>
> Now, I'm looking for a function similar to "inpolygon" from Matlab. So, I
> could define "weird" polygon and only extract those indexes (which would
> probably be non-sequential). Perhaps with "Shapefile.jl" ?

https://github.com/helenchg/PolygonClipping.jl/blob/1fc74ab797c6585795283749b7c4cc9cb2000243/src/PolygonClipping.jl#L139
>>>> you have GDAL <https://github.com/visr/GDAL.jl> and ArchGDAL
>>>> <https://github.com/yeesian/ArchGDAL.jl> set up, it might be something
>>>> - https://github.com/perrygeo/python-rasterstats which uses
>>>> - https://github.com/mapbox/rasterio and
>>>> - https://github.com/Toblerity/Fiona

Philippe Roy

unread,
Oct 11, 2016, 2:36:11 PM10/11/16
to julia-geo
Maybe it's me, but I'm unable to use the function "isinside" from the PolygonClipping.jl package. I've cloned the repository and wrote "using PolygonClipping", but after that I can't use "isinside(v, P)"

Running "include("runtests.jl") from test folder throws this error :

julia> include("runtests.jl")
ERROR: LoadError: LoadError: UndefVarError: Nothing not defined
 in include_from_node1(::String) at ./loading.jl:488
 in eval(::Module, ::Any) at ./boot.jl:234
 in require(::Symbol) at ./loading.jl:415
 in include_from_node1(::String) at ./loading.jl:488
while loading /home/proy/.julia/v0.5/PolygonClipping/src/PolygonClipping.jl, in expression starting on line 11
while loading /home/proy/.julia/v0.5/PolygonClipping/test/runtests.jl, in expression starting on line 3

Mauro

unread,
Oct 11, 2016, 3:40:02 PM10/11/16
to juli...@googlegroups.com
It has not been updated since 2014 (Julia 0.3), so it would need some
love to get going again. Maybe there are other implementations about,
if you find any, let me know. Here for some digging:
https://github.com/search?q=inside+polygon+language%3AJulia&ref=searchresults&type=Code&utf8=%E2%9C%93

I use this
https://gist.github.com/mauro3/757223ce0003ebfdd69ed881f3529784

which you're welcome to use. If I recall correctly I've run into
some edge-case bugs when I used it with very self intersecting polygon.

On Tue, 2016-10-11 at 20:36, 'Philippe Roy' via julia-geo <juli...@googlegroups.com> wrote:
> Maybe it's me, but I'm unable to use the function "isinside" from the
> PolygonClipping.jl package. I've cloned the repository and wrote "using
> PolygonClipping", but after that I can't use "isinside(v, P)"
>
> Running "include("runtests.jl") from test folder throws this error :
>
> julia> include("runtests.jl")
> ERROR: LoadError: LoadError: UndefVarError: Nothing not defined
> in include_from_node1(::String) at ./loading.jl:488
> in eval(::Module, ::Any) at ./boot.jl:234
> in require(::Symbol) at ./loading.jl:415
> in include_from_node1(::String) at ./loading.jl:488
> while loading
> /home/proy/.julia/v0.5/PolygonClipping/src/PolygonClipping.jl, in
> expression starting on line 11
> while loading /home/proy/.julia/v0.5/PolygonClipping/test/runtests.jl, in
> expression starting on line 3
>
>
> Le mardi 11 octobre 2016 13:05:47 UTC-4, Mauro a écrit :
>>
>> On Tue, 2016-10-11 at 16:12, 'Philippe Roy' via julia-geo <

Michael Krabbe Borregaard

unread,
Oct 11, 2016, 3:46:23 PM10/11/16
to juli...@googlegroups.com
LibGEOS.jl seems more recent, and defines e.g. intersects that looks like it may work? (untested)

--
You received this message because you are subscribed to the Google Groups "julia-geo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-geo+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philippe Roy

unread,
Oct 12, 2016, 12:52:29 PM10/12/16
to julia-geo
Thanks Mauro! I used your code with success and included it in a little local package that I use. Just for my information, do you think there is room for some performance optimization? Just asking so that I can see if i should try to optimize it or not.

Thanks again!

Philippe Roy

unread,
Oct 12, 2016, 12:53:04 PM10/12/16
to julia-geo
I'll take a look at LibGEOS.jl later in the week. Thanks for the suggestion!
To unsubscribe from this group and stop receiving emails from it, send an email to julia-geo+...@googlegroups.com.

Michael Krabbe Borregaard

unread,
Oct 12, 2016, 1:02:53 PM10/12/16
to juli...@googlegroups.com
Great - it is great that you update on the process here, very useful.

Mauro

unread,
Oct 12, 2016, 1:15:52 PM10/12/16
to juli...@googlegroups.com
On Wed, 2016-10-12 at 18:52, 'Philippe Roy' via julia-geo <juli...@googlegroups.com> wrote:
> Thanks Mauro! I used your code with success and included it in a little
> local package that I use. Just for my information, do you think there is
> room for some performance optimization? Just asking so that I can see if i
> should try to optimize it or not.

I did optimize it, so the low hanging fruits (at least compared to my
"height") will have gone (I hope). But I'm sure there are more tweaks.
Let me know if you find any. Also let me know how it compares to
LibGEOS, so I can switch if necessary.

> Thanks again!
>
>
>
> Le mardi 11 octobre 2016 15:40:02 UTC-4, Mauro a écrit :
>>
>> It has not been updated since 2014 (Julia 0.3), so it would need some
>> love to get going again. Maybe there are other implementations about,
>> if you find any, let me know. Here for some digging:
>>
>> https://github.com/search?q=inside+polygon+language%3AJulia&ref=searchresults&type=Code&utf8=%E2%9C%93
>>
>> I use this
>> https://gist.github.com/mauro3/757223ce0003ebfdd69ed881f3529784
>>
>> which you're welcome to use. If I recall correctly I've run into
>> some edge-case bugs when I used it with very self intersecting polygon.
>>
>> On Tue, 2016-10-11 at 20:36, 'Philippe Roy' via julia-geo <

Philippe Roy

unread,
Oct 13, 2016, 3:19:39 PM10/13/16
to julia-geo
Well, I must admit that I just don't understand how to use LibGEOS.jl ! So, until i understand or find some sort of documentation on how to use the package, there won't be any benchmarking between "inpoly" function from Mauro and LibGEOS :)

Yeesian Ng

unread,
Oct 13, 2016, 6:30:04 PM10/13/16
to julia-geo
The documentation for LibGEOS is through her tests (e.g. to create a point, create a polygon, and to test whether the polygon contains the point.)

But if the "inpoly" function works for you, then that's great (:

Philippe Roy

unread,
Oct 13, 2016, 7:28:32 PM10/13/16
to julia-geo
Thanks!

I'll give a look tomorrow at work. Much appreciated! I should develop the automatism of looking into the tests suite of modules more often. :)

Philippe Roy

unread,
Oct 14, 2016, 1:13:21 PM10/14/16
to julia-geo
It does work well with LibGEOS. However, I'm still trying to convert an Array (m x 2 dimensions) to a LibGEOS.Polygon type.

Le jeudi 13 octobre 2016 18:30:04 UTC-4, Yeesian Ng a écrit :

Yeesian Ng

unread,
Oct 14, 2016, 2:27:31 PM10/14/16
to julia-geo
The ragged nature of complex polygons means you'll have to convert it to a Vector of Vector of Vectors for now. You can read this wiki for some previous thoughts on it.
Reply all
Reply to author
Forward
0 new messages