Spatial partitioning questions

37 views
Skip to first unread message

Hori

unread,
Sep 6, 2020, 9:49:33 PM9/6/20
to GeoSpark Discussion Board
Geospark may have several methods of spatial partitioning, but what settings are required to use them?
I'm just starting to learn about Geospark, so this is a simple question that I'd be happy to answer.

Jia Yu

unread,
Sep 6, 2020, 9:54:06 PM9/6/20
to GeoSpark Discussion Board


---------- Forwarded message ---------
From: Jia Yu <jiayu...@gmail.com>
Date: Sun, Sep 6, 2020 at 6:53 PM
Subject: Re: Spatial partitioning questions
To: Hori <hr.9.ke...@gmail.com>
Cc: GeoSpark Discussion Board <geospark-dis...@googlegroups.com>


The partitioning method is determined by the skewness of the input join data. In general, the best partitioning method is "KDB-Tree" partitioning. Spatial partitioning also requires that the data should have less overlap. Some data such as trajectories which are overlapped, is not very suitable for Apache Sedona.

On Sun, Sep 6, 2020 at 6:49 PM Hori <hr.9.ke...@gmail.com> wrote:
Geospark may have several methods of spatial partitioning, but what settings are required to use them?
I'm just starting to learn about Geospark, so this is a simple question that I'd be happy to answer.

--
You received this message because you are subscribed to the Google Groups "GeoSpark Discussion Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geospark-discussio...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/geospark-discussion-board/5d9896c9-eb4d-4d0e-8497-45ce18bee0d1n%40googlegroups.com.

------------------------------------

Jia Yu (new email: jia...@wsu.edu)

Assistant Professor

Washington State University School of EECS

Reach me via: Homepage | GitHub

Hori

unread,
Sep 6, 2020, 10:29:44 PM9/6/20
to GeoSpark Discussion Board
Thank you for the helpful reply. It means that the appropriate partitioning method differs depending on the input data.
Now, I'd like to compare the respective spatial partitioning methods in Apache Sedona. How do I specify that spatial partitioning in my program?


2020年9月7日月曜日 10:54:06 UTC+9 Jia Yu:

Jia Yu

unread,
Sep 6, 2020, 10:33:35 PM9/6/20
to GeoSpark Discussion Board
You can find the tutorial about how to spatial partitioning in RDD in Java, Scala, SQL, Python here: https://apache.github.io/incubator-sedona/tutorial/rdd/

Spatial partitioning section is under "Spatial join query" in all tutorials. You can also search on the website.

Hori

unread,
Sep 6, 2020, 10:39:48 PM9/6/20
to GeoSpark Discussion Board
Thank you. I will refer to the site.

2020年9月7日月曜日 11:33:35 UTC+9 jiayu...@gmail.com:

Hori

unread,
Sep 24, 2020, 7:08:26 AM9/24/20
to GeoSpark Discussion Board
Let me ask the question again.

It is written on the site that the spatial partitioning technique is very effective in join queries, but is it not effective in range queries and kNN queries?

2020年9月7日月曜日 11:39:48 UTC+9 Hori:

Jia Yu

unread,
Sep 24, 2020, 2:18:09 PM9/24/20
to Hori, GeoSpark Discussion Board
Hello

You should not use spatial partitioning in range queries and kNN queries. This will bring in duplicate results.

Thanks,
Jia

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages