Hi all,
I'm wondering about the differences between using sjoin and overlay and any guidance on when to prefer one over the other.
My current use case is trying to perform a point-in-polygon join, so that the input GeoDataFrame of points will have the attributes of whichever polygon they intersect. More specifically, I have a gridded set of points that is directly on top of a set of about a dozen different polygons and can confirm that the polygons are non-overlapping.
I'd first tried this with overlay, but as seen in the attached image, a significant number of points weren't tagged with attributes from the polygons. Blue shows the set of points that have been tagged with polygon data and orange shows the set of points that were not.
gpd.overlay(gdf_points, gdf_polygons, how='intersection')
I tried again using sjoin and this time, both the blue and orange sets were correctly tagged with values from the polygons underneath.
gpd.sjoin(gdf_points, gdf_polygons, how='inner', op='intersects')
Looking at the overlay documentation, it mentions that it only supports GeoDataFrames that have the same geometry type, but this seems confusing since at least a subset of points were tagged correctly when using overlay with a point GeoDataFrame and a polygon GeoDataFrame. This set operations with overlay page also seems to suggest that overlay is best suited for polygon-polygon operations but is there a more definitive set of guidance on when to use sjoin versus overlay?
Thanks!
Jon