ST_Intersects question

27 views
Skip to first unread message

ashish agarwal

unread,
Apr 23, 2020, 10:53:41 AM4/23/20
to GeoSpark Discussion Board
We have two polygon RDD and we want to find if boundaries in rdd1 intersect with boundaries in rdd2

we did a simple spatial join using distance query with  suggested optimization for Quadtree and kyro serializer using SQL  but because the datasets are very huge the jobs runs for several hours in a big cluster and we want to optimize that

I was thinking of something like this

1. st_intersects(st_envelope(df1.geom),st_envelope(df2.geom))
2. st_intersects(df1.geom,df2.geom) - for only those records that satisifes 1



will it help or the st_intersects automatically creates a envelope before finding the actual intersection??
Most of the geometries  will be filtered in step 1 as we have data from multiple geographies ex (:-and one state boundary will not instersect with other state).

ashish agarwal

unread,
May 6, 2020, 12:51:03 PM5/6/20
to GeoSpark Discussion Board

Jia Yu

unread,
May 6, 2020, 5:06:03 PM5/6/20
to ashish agarwal, GeoSpark Discussion Board
Hi Ashish,

When processing ST_Intersects, GeoSpark already did the following: use the bounding boxes of geospatial objects to do filtering first, then use their real shapes to perform the refine phase.

Please consider improve the num partitions or use KDBTree partitioning method.

Thanks,
Jia

------------------------------------

Jia Yu

Ph.D. Candidate in Computer Science



--
You received this message because you are subscribed to the Google Groups "GeoSpark Discussion Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geospark-discussio...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/geospark-discussion-board/6a6bc526-ed65-47f1-a3f1-2656138d54b0%40googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages