[ANN] DBaseReader.jl

94 views
Skip to first unread message

Penn Taylor

unread,
Oct 19, 2015, 12:52:38 AM10/19/15
to julia-geo
I've been using Yeesian's Shapefile.jl package for about a week read in the US Census Bureau's shapefiles, and then using the polygons to generate choropleths with Gadfly. I got tired of using Python to convert the .dbf pieces to a .csv so that I could deal with all the record data in Julia, so I wrote a plain Julia package to do the job. I managed to tweak the performance so it can parse a binary dbf file at about the same speed as the DataFrames package can read in the equivalent csv.

Thought you folks might be interested. Two big caveats: 1) I haven't written proper tests for it, and 2) I developed this atop Julia 0.5-dev and have not yet dropped back to 0.4 to test it there. It's entirely possible there will be some type goofiness that prevents its running on 0.4. I expect to have time later in the week to write tests and ensure compatibility with Julia 0.4.

Penn Taylor

unread,
Oct 19, 2015, 12:55:37 AM10/19/15
to julia-geo

Yeesian Ng

unread,
Oct 19, 2015, 1:05:40 AM10/19/15
to julia-geo
Nice work! Might you be interested in submitting a PR to the Shapefile.jl repo? Keno wrote most of it, but I'll be happy to help review it when it's ready.

Fabian Gans

unread,
Oct 19, 2015, 4:31:08 AM10/19/15
to julia-geo
You might also consider adding the Format to https://github.com/JuliaIO/FileIO.jl .

Penn Taylor

unread,
Oct 20, 2015, 5:12:29 PM10/20/15
to juli...@googlegroups.com
Folding the DBaseReader code into the Shapefile repo would be great, as would registering with FileIO.

I have added tests and confirmed that DBaseReader works with Julia 0.4 -- it's definitely *not* compatible with 0.3 in its current state. It would be great if a couple people would be willing to clone the current repo, try opening a few dbfs, and file issues if there are errors.

If there are no outstanding issues in the next few days, I'll open an issue on Shapefile.jl to discuss what needs to be done to DBaseReader to best integrate it with Shapefile.

--
You received this message because you are subscribed to a topic in the Google Groups "julia-geo" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/julia-geo/vqLLDwNldrU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to julia-geo+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yeesian Ng

unread,
Oct 20, 2015, 8:04:50 PM10/20/15
to julia-geo
Given ESRI's specification, it makes sense to bundle it together with Shapefile. After that, we can start the discussion of the packages that should be moved to FileIO.

Levi John Wolf

unread,
Oct 22, 2015, 10:40:32 PM10/22/15
to julia-geo
This reader is solid.

I've tested it on all of these, which should cover the entire set of primitives in shp/dbf combinations.

The only issue was on one column in the "taz" example sets where the dbf has an an "F" type column. I'm not sure, but it may suffice to just read an F as N?

Levi John Wolf

unread,
Oct 22, 2015, 10:46:13 PM10/22/15
to julia-geo
Also, on the topic of pairing dbf/shp, it might make sense to subtype DataFrame, adding a "geometry" column that's used to enable spatial operations, like in PostGIS or GeoPandas. This could serve as a reasonable "destination type" for tabular geodata.

Penn Taylor

unread,
Oct 23, 2015, 1:36:55 PM10/23/15
to julia-geo
Thanks for testing and pointing out the 'F' type issue. I just committed changes to support that type.

There's some hesitancy/resistance to having a dependency on the full DataFrames package built into Shapefile, so I've changed the interface slightly to return a Dict of DataArrays rather than a DataFrame. Is the DataFrame converter in GeoConverters.jl similar to what you're picturing as a destination type?

Levi John Wolf

unread,
Oct 23, 2015, 1:54:08 PM10/23/15
to julia-geo
Yes, exactly. But, I think it makes more sense to codify the special structure of the Dict/DataFrame in a subtype, rather than using functions to convert between generics.

Maybe consolidating around GeoInterface feature collections could do this. So, for example, I'm thinking that GeoJSON.parsefile("test.json", FeatureCollection) targets the FeatureCollection type and GeoJSON.parsefile("test.json", GeoDataFrame) would target some subtype of DataFrame with a geometry field.

This way, we can leverage changes to JuliaGeometry primitives, while retaining a coherent "destination" for raw Julia IO as well as whatever OGR wrapper is made.

Regardless, this idea should probably move to another thread.

--
Reply all
Reply to author
Forward
0 new messages