Citus & Other multi-tenant questions

56 views

Skip to first unread message

RoyBR

unread,

Aug 15, 2018, 5:48:25 PM8/15/18

to ZomboDB

Hi,

I'm just getting started looking into Zombodb as an alternative for an existing system using separate database, indexing, and search layers...

A couple of questions. Does ZomboDB support citusdb? How is data from different shards allocated to the ES cluster? Is this predictable/controllable?

In our usecase the sharding is based on a tenant id in our application. As such, we want data to be isolated in almost every case (probably always for searching)... Is it possible to assign an index on a per-tenant basis for example?

Is it possible to make a query "mixed" with an ES query and a DB query? i.e. SELECT * FROM mytable WHERE tenant_id = 123 AND description ==>'(keywords:(sports OR box) . for example? is this an efficient way to query? (or would ES be querying the entire index for mytable behind the scenes, and then Zombo is filtering out the conditions?)...?

I'm not sure if I'm making any sense...its been a long day lol..otherwise I'll post some more illustrated examples.

Eric Ridge

unread,

Aug 15, 2018, 6:18:36 PM8/15/18

to zom...@googlegroups.com

On Aug 15, 2018, at 3:48 PM, RoyBR <rna...@neoassist.com> wrote:
Hi,

I'm just getting started looking into Zombodb as an alternative for an existing system using separate database, indexing, and search layers...

You've come to the right place! ;)

A couple of questions. Does ZomboDB support citusdb? How is data from different shards allocated to the ES cluster? Is this predictable/controllable?

This is complicated to answer. Previous versions of ZDB (prior to the version released this week that supports PG 10) did, at one point, work with Citus, to some degree. I even submitted a few PRs to Citus in order to enable the two to work together.

However, I'm not sure about how well ZDB & Citus work together here in 2018. I do have tentative plans of evaluating them together, but probably not until after mid-September. I'm focused on getting the new version of ZDB stable and generally production ready.

Related: I've recently received similar questions about ZDB + TimescaleDB.

In general tho, Citus would create a remote Elasticsearch index for each Citus shard. All the shards would use the same ES cluster, just each shard would have a different index. Then as Citus decides to query a specific shard, if a ZDB query clause was involved, then it would use the ES index for just that shard.

Where things get difficult with querying is around ZDB's aggregate functions such as `zdb.count()` and `zdb.tally()`.

In our usecase the sharding is based on a tenant id in our application. As such, we want data to be isolated in almost every case (probably always for searching)... Is it possible to assign an index on a per-tenant basis for example?

Well, if you're talking about plain Postgres (ie, not Citus) then sure, you can create partial indices by tenant_id, for example:

CREATE INDEX idx_tenant_123 ON table USING zombodb ((table.*)) WITH (url='...') WHERE tenant_id = 123;

And then when you query it, you have to include the predicate from the CREATE INDEX statement in your query:

SELECT * FROM table WHERE table ==> 'zdb query here' AND tenant_id = 123;

If you're talking about Citus, then Citus would handle all this automatically -- assuming the combination of the two don't just cause Postgres to segfault.

Is it possible to make a query "mixed" with an ES query and a DB query? i.e. SELECT * FROM mytable WHERE tenant_id = 123 AND description ==>'(keywords:(sports OR box) . for example? is this an efficient way to query? (or would ES be querying the entire index for mytable behind the scenes, and then Zombo is filtering out the conditions?)...?

It is, but your example query would need to look like:

SELECT * FROM table WHERE tenant_id = 123 AND table ==> 'description:(sports or box)';

The point is that ZDB needs to query against the table reference (named "table" here) rather than a specific field. ZDB indexes whole (or partial) rows, not individual columns.

I'm not sure if I'm making any sense...its been a long day lol..otherwise I'll post some more illustrated examples.

Same. :)

If you'd like to create a new Issue on ZomboDB's GitHub repository about future Citus support we can discuss it there in more detail. Like I said, I have tentative plans to get around to it, but it'll be a little bit before I have time.

eric

rna...@neoassist.com

unread,

Aug 17, 2018, 9:33:27 AM8/17/18

to ZomboDB

On Wednesday, August 15, 2018 at 7:18:36 PM UTC-3, Eric Ridge wrote:

On Aug 15, 2018, at 3:48 PM, RoyBR <rna...@neoassist.com> wrote:
Hi,

I'm just getting started looking into Zombodb as an alternative for an existing system using separate database, indexing, and search layers...

You've come to the right place! ;)

Great! :D

A couple of questions. Does ZomboDB support citusdb? How is data from different shards allocated to the ES cluster? Is this predictable/controllable?

This is complicated to answer. Previous versions of ZDB (prior to the version released this week that supports PG 10) did, at one point, work with Citus, to some degree. I even submitted a few PRs to Citus in order to enable the two to work together.

However, I'm not sure about how well ZDB & Citus work together here in 2018. I do have tentative plans of evaluating them together, but probably not until after mid-September. I'm focused on getting the new version of ZDB stable and generally production ready.

Related: I've recently received similar questions about ZDB + TimescaleDB.

In general tho, Citus would create a remote Elasticsearch index for each Citus shard. All the shards would use the same ES cluster, just each shard would have a different index. Then as Citus decides to query a specific shard, if a ZDB query clause was involved, then it would use the ES index for just that shard.

Where things get difficult with querying is around ZDB's aggregate functions such as `zdb.count()` and `zdb.tally()`.

Understood... I did see a few old links on google etc, but I wasnt quite sure...

Nonetheless your explanation makes sense.

In our usecase the sharding is based on a tenant id in our application. As such, we want data to be isolated in almost every case (probably always for searching)... Is it possible to assign an index on a per-tenant basis for example?

Well, if you're talking about plain Postgres (ie, not Citus) then sure, you can create partial indices by tenant_id, for example:

CREATE INDEX idx_tenant_123 ON table USING zombodb ((table.*)) WITH (url='...') WHERE tenant_id = 123;

And then when you query it, you have to include the predicate from the CREATE INDEX statement in your query:

SELECT * FROM table WHERE table ==> 'zdb query here' AND tenant_id = 123;

If you're talking about Citus, then Citus would handle all this automatically -- assuming the combination of the two don't just cause Postgres to segfault.

Is it possible to make a query "mixed" with an ES query and a DB query? i.e. SELECT * FROM mytable WHERE tenant_id = 123 AND description ==>'(keywords:(sports OR box) . for example? is this an efficient way to query? (or would ES be querying the entire index for mytable behind the scenes, and then Zombo is filtering out the conditions?)...?

It is, but your example query would need to look like:

SELECT * FROM table WHERE tenant_id = 123 AND table ==> 'description:(sports or box)';

(you are right.. I hadnt caught on to the table-wide index, and actually copied the query from one of the examples and didnt paste it properly, but thanks for pointinng both points!)

The point is that ZDB needs to query against the table reference (named "table" here) rather than a specific field. ZDB indexes whole (or partial) rows, not individual columns.

I'm not sure if I'm making any sense...its been a long day lol..otherwise I'll post some more illustrated examples.

Same. :)

All clear!!

If you'd like to create a new Issue on ZomboDB's GitHub repository about future Citus support we can discuss it there in more detail. Like I said, I have tentative plans to get around to it, but it'll be a little bit before I have time.

We are still evaluating pros/cons of an eventual migration to postgres at this point. On another note, have you given any thought about integrating Elasticsearch SQL into ZomboDB? I have used it for testing and found it quite intuitive as a way of bringing developeers into ES, bringing it into the RDBMS would be even better... (Obviously it would need some thought, but the idea sounds interesting...?)