How does Citusdata compare to Postgres-XL?

5,497 views
Skip to first unread message

Francisco Reyes

unread,
Apr 29, 2016, 9:59:10 PM4/29/16
to citus-users
I have been tasked to find an open source solution to do OLAP for a client. The solution needs to be Postgres derived and two that I am starting to research are Citusdata and Postgres-XL.

Does anyone know of a comparison sheet between the two? Or pointer to previous discussions. Not finding much in terms of direct comparisons.

Initial data set will be  7TB.

Background:
* Client has an OLTP cluster (postgres) and an OLAP cluster (Commercial version of GreenPlum)
* I have been tasked with researching options other than the now open source GreenPlum
* Another team, greenplum DBAs, will evaluate the open source GreenPlum.
* Due to security concerns client would prefer to not use a provider solution such as RedShift, but to have their own hardware along with their other infrastructure in the data center.
* Availability for commercial support a plus. From what I am finding seems both CitusData and Postgres-XL have entities which offer commercial support.

Marco Slot

unread,
May 2, 2016, 9:11:15 AM5/2/16
to citus-users
Each of the systems you mentioned can partition data over many machines and parallelize SELECT queries. However, they each address different use-cases. Can you share more details about the use-cases you're targeting? You can also contact us privately.

One of the biggest benefits of Citus over other solutions is that it is an extension to PostgreSQL 9.5, rather than a fork. This means that you can always use the latest PostgreSQL features and performance enhancements, use it in combination with other extensions, and operate it as you would a normal PostgreSQL database.

Citus is geared towards a high rate of fast, simple transactions (single statement), high throughput bulk loading, and a high rate of subsecond analytical queries. It also integrates with extensions like cstore_fdw for columnar storage and hll for fast distinct counts. A limitation is that it does not support all SQL queries or complex transactions.

Postgres-XL is geared towards complex transactions, but with more per-transaction overhead. As far as I understand, it only uses one core per machine for analytical queries, which limits the speed-up. Another limitation is the lack of built-in fail-over for data nodes.

GreenPlum and Redshift are data warehouses, which are geared towards high throughput bulk loading and answering complex analytical queries in a few seconds or minutes. Limitations of data warehouses include high planning overhead and low query rates.

If you need interactive analytics (e.g. in an application with many users) or need to scale out simple transactions: Citus
If you need multi-statement ACID transactions: Postgres-XL
If you need to run complex reporting queries: GreenPlum / Redshift

Hope this helps,
Marco

awo...@silversky.com

unread,
May 3, 2016, 9:50:44 AM5/3/16
to citus-users

Marco, I'm not sure how to read this.  How does Citus facilitate high speed analytical queries or interactive if it doesn't support all SQL queries?  The queries seem only to be optimized on the partitioning column(s).  I've observed it sometimes won't sort queries on foreign Citus tables or do analytical windowing/ranking etc. or am I missing something?

Thanks,

Andrew Wolfe

On Monday, May 2, 2016 at 9:11:15 AM UTC-4, Marco Slot wrote:
Citus is geared towards ... a high rate of subsecond analytical queries. ...
 
If you need interactive analytics (e.g. in an application with many users) ... Citus

Francisco Reyes

unread,
May 3, 2016, 10:10:27 AM5/3/16
to citus-users
On Monday, May 2, 2016 at 9:11:15 AM UTC-4, Marco Slot wrote:
If you need interactive analytics (e.g. in an application with many users) or need to scale out simple transactions: Citus
If you need multi-statement ACID transactions: Postgres-XL
If you need to run complex reporting queries: GreenPlum / Redshift


Marco,

Thanks for the summary; very informative. The initial case is complex reporting queries coming from multiple departments and the DBA team only gets involved on helping optimize after we see slow queries.

Having said that, I think will explore how Citus/Postgres-XL may help in some other cases.

We have 
* Pure OLTP - high rate of operations with mostly simple statements
* Analytics/complex queries over TBs of data - fewer (under 100) connections of mostly complex queries over large sets.
* Middle ground cases where the rate of connections, or the amount of connections, is not as fast as OLTP, the data is not so large as the warehouse, the queries not so complex, but the amount of data is bigger than most of our OLTP DBs (say 100GB). Wondering if this may be a good fit for CitusData.

Is there a link with CitusData limitations and/or the type of queries it does not support? Looking here https://www.citusdata.com/docs/citus/5.0/index.html, not seeing list of limitations.

Marco Slot

unread,
May 3, 2016, 11:36:45 AM5/3/16
to citus-users
On Tuesday, May 3, 2016 at 3:50:44 PM UTC+2, awo...@silversky.com wrote:

Marco, I'm not sure how to read this.  How does Citus facilitate high speed analytical queries or interactive if it doesn't support all SQL queries?  The queries seem only to be optimized on the partitioning column(s).  I've observed it sometimes won't sort queries on foreign Citus tables or do analytical windowing/ranking etc. or am I missing something?

Citus currently targets applications, whereas data warehouses primarily target end-users that manually write SQL. If you have an interactive application where the user can change many filters, ordering, group by settings etc., you need a database that can respond to these queries quickly and can handle many of them at the same time. If your data is big or fast-growing, you need horizontal scale, which means you need Citus. Data warehouses cannot serve as an application back-end because of the planning overhead and poor concurrency.

Queries that cannot be parallelized along the partition dimension will not be very fast on any system. Some systems offer more SQL functionality by pulling data into one place when the query cannot be distributed (Postgres-XL, Oracle RAC, MySQL cluster). However, this is not useful for applications. It is much more useful to rewrite queries in a way that can be efficiently parallelized, which Citus helps you with. For example, Citus can scale out distinct counts using HLL, and we offer a session analytics package which can compute funnel queries in scalable way. More importantly, you can use arbitrary functions and most simple queries (including aggregates, joins, filters, group by, order by, subqueries in the from clause, etc.) just work out-of-the-box.

Citus can also act as a type of NoSQL datastore, if each query specifies a partition key value. In that case, it uses a separate planner and executor to send queries to the right shard with minimal overhead. This allows you to scale out your memory and CPU power. The fact that Citus can act both as a scalable data store and as an analytics platform, can keep real-time and historical data, and can even pre-process your data [1], makes it quite powerful as far as distributed databases go.

Of course, there are still many features that we'd like to add. If you have a strong need for one, don't hesitate to contact us or comment on/open an issue: https://github.com/citusdata/citus/


Hope this helps,
Marco

mkr...@trialfire.com

unread,
Dec 8, 2016, 5:40:22 PM12/8/16
to citus-users
Hi Francisco,

  I'm wondering if you've moved forward with this. Were you able to benchmark either Citus or Postgres-XL ? We're in the same boat so I'm curios about your progress.

-Max

Francisco Reyes

unread,
Dec 8, 2016, 5:49:03 PM12/8/16
to mkr...@trialfire.com, citus-users
No. Did not move on it. However, for my use case I likely will use Citus in the future; at least until Postgres-XL improves how it handles failure. Last I checked it was not very good at it, whereas Citus can re-point traffic even in flight. Also, citus will eventually have writing to any node instead of having to use a coordinator so in essence becoming multi-master.

--
You received this message because you are subscribed to a topic in the Google Groups "citus-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/citus-users/VhDYliYuLXk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to citus-users+unsubscribe@googlegroups.com.
To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/citus-users/220f2d23-3174-48e8-96dd-f77d19ce72cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages