Difference between sjoin and overlay

Jonathan Cusick

unread,

Nov 11, 2020, 3:21:01 PM11/11/20

to geopandas

Hi all,

I'm wondering about the differences between using sjoin and overlay and any guidance on when to prefer one over the other.

My current use case is trying to perform a point-in-polygon join, so that the input GeoDataFrame of points will have the attributes of whichever polygon they intersect. More specifically, I have a gridded set of points that is directly on top of a set of about a dozen different polygons and can confirm that the polygons are non-overlapping.

I'd first tried this with overlay, but as seen in the attached image, a significant number of points weren't tagged with attributes from the polygons. Blue shows the set of points that have been tagged with polygon data and orange shows the set of points that were not.

gpd.overlay(gdf_points, gdf_polygons, how='intersection')

I tried again using sjoin and this time, both the blue and orange sets were correctly tagged with values from the polygons underneath.

gpd.sjoin(gdf_points, gdf_polygons, how='inner', op='intersects')

Looking at the overlay documentation, it mentions that it only supports GeoDataFrames that have the same geometry type, but this seems confusing since at least a subset of points were tagged correctly when using overlay with a point GeoDataFrame and a polygon GeoDataFrame. This set operations with overlay page also seems to suggest that overlay is best suited for polygon-polygon operations but is there a more definitive set of guidance on when to use sjoin versus overlay?

Thanks!

Jon

sjoin_vs_overlay.png

Martin Fleischmann

unread,

Nov 13, 2020, 5:50:52 AM11/13/20

to geop...@googlegroups.com, Jonathan Cusick

Hi Jonathan,

The main difference between sjoin and overlay is that sjoin merges attributes from other gdf, to existing geometry. While overlay creates a new geometry (as intersection, difference…).

In case of points and polygons, you will not see the difference as intersection of point and polygon is always the same point, but if you want to merge two sets of polygons, then overlay might be the better option for you.

So the rule of thumb could be as follows:

I want the same geometry I have, just link attributes -> sjoin

I want new geometry as a result of operation between two dataframes -> overlay.

Martin

--
You received this message because you are subscribed to the Google Groups "geopandas" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geopandas+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/geopandas/f0259f95-419f-4e30-9ac3-9dbc2a73e5a5n%40googlegroups.com.

Jonathan Cusick

unread,

Nov 16, 2020, 8:35:57 AM11/16/20

to Martin Fleischmann, geop...@googlegroups.com

Hi Martin,

Thanks a bunch for the clarification, I'll definitely work off of that going forward. It seems like the behavior I'm getting using overlay though might be a possible bug since some of the points aren't tagged correctly as compared to sjoin. I'll look into it further on my end and raise an issue on Github if it still seems like something strange is happening.