GIS Spatial Analysis using Citus as Database and PostGIS as extension

439 views
Skip to first unread message

Behzad Varaminiyan

unread,
Sep 15, 2016, 12:40:45 PM9/15/16
to citus-users
Hello there,
I am researching Databases for storing and analyzing GIS spatial data. PostGIS (and PostgreSQL, of course) seem very suitable for our use cases. 
We also intend to use PostgreSQL functions (Stored Procedures) for custom spatial analysis on that data.
However, we need a database that not only distributes data across multiple PostgreSQL instances, but also analyzes data locally in each instance.
Considering all the above, I concluded that Citus Data must be appropriate for us.

What does the community think? Have I made any mistake? Is it feasible?

Andres Freund

unread,
Sep 15, 2016, 1:21:09 PM9/15/16
to Behzad Varaminiyan, citus-users
Hello,

On 2016-09-15 09:40:44 -0700, Behzad Varaminiyan wrote:
> I am researching Databases for storing and analyzing GIS spatial data.
> PostGIS (and PostgreSQL, of course) seem very suitable for our use cases.
> We also intend to use PostgreSQL functions (Stored Procedures) for custom
> spatial analysis on that data.
> However, we need a database that not only distributes data across multiple
> PostgreSQL instances, but also analyzes data locally in each instance.
> Considering all the above, I concluded that Citus Data must be appropriate
> for us.

Could you expand on what exactly you mean with "analyzing data locally
in each instance"? That computations should be distributed across
multiple nodes, to use more CPU in total / get faster responses? Or
that you want some non-distributed tables?


> What does the community think? Have I made any mistake? Is it feasible?

Citus might be suitable for the use-case. But it depends heavily on what
kind of queries you want to run in a parallel fashion, and what kind of
analytics you're performing. Citus has some restrictions about the kind
of queries it can execute over distributed tables, and those need to
match your application's architecture.


Greetings,

Andres Freund

Behzad Varaminiyan

unread,
Sep 15, 2016, 1:45:38 PM9/15/16
to citus-users
The computation must be distributed across multiple nodes. Response time is not our top priority.

Mostly, we are talking about Map overlays (combining multiple layers) and statistics (counting).

I believe that will lead to COUNT queries across multiple distributed tables and saving the result as distributed tables.

Jason Petersen

unread,
Sep 23, 2016, 5:56:16 PM9/23/16
to citus-users
We'll probably need a few more details to be able to give you a good picture of what's possible to compute in a distributed fashion. Do you have some (possibly sanitized) queries and a table schema we could look at? Citus is capable of distributing the computations within a query to better utilize the compute power at each worker node, but without concrete queries it's hard to tell you precisely what the system would do with a given query.

Behzad Varaminiyan

unread,
Oct 1, 2016, 5:55:50 AM10/1/16
to citus-users
Currently, there are no queries available for me to post, since the project is in research stages. 
As of moment, I'm just researching the limitations. 
Could you generalize the imposed limitations in Citus Data vs. PostgreSQL?
Reply all
Reply to author
Forward
0 new messages