Considerations on using ZomboDB

662 views
Skip to first unread message

nielsk...@autouncle.com

unread,
Dec 22, 2016, 6:26:19 AM12/22/16
to ZomboDB
I have quite large Postgresql table, millions of records, (let's call it "cars"). The table holds 100+ columns of string, number, datatime and booleans. Around 50 of them are queried heavily (and in many different combinations, quite unpredictable). This means, that even though I do NOT require any full text capabilities (searches are only reqular things like "WHERE price >= 10000 AND fuel_type = 'Diesel' AND has_automatic_transimission IS TRUE AND created_at > '20160101'"), the table is super hard to index, due to the shifting nature of the queries. Also the table is updated quite heavily, which also means that indexing get's heavy. 

Now for this reason I want to use Elastic for indexing those 50 columns and searching them efficiently. I'm considering using ZomboDB for carrying the job of keeping the index up-to-date (I can easily live with a minute delay max). My questions are now:

  • Can I create a partial index of my data using zomboDB? From the "cars" table I only need to partially index the data based on a string column "availability" which can take one of two values: "sold" and "for sale" - I only want to index those that are "for sale" (postgresql supports partial indexes like "CREATE INDEX price_for_sale_on_cars ON cars USING btree (price) WHERE availability = 'for sale'").
  • Can I query the data using none-full-text search functionality like ? : 
{
    "query": {
        "range" : {
            "price" : {
                "gte" : 10000,
                "lte" : 20000
            }
        }
    }
}
  • Is it possible to create indexes across tables? Let's say "cars" has many "sellers" and I wanted the column postcode from sellers to be indexed as well?



Eric B. Ridge

unread,
Dec 28, 2016, 10:39:01 PM12/28/16
to zom...@googlegroups.com
Hi Niels! I apologize for the delayed response. I had family in town
for the holidays and my wife told me I wasn't allowed to use the
computer. They just left this morning so my restrictions have been
lifted.

I'll try to answer more below.

On Thu, Dec 22, 2016 at 4:26 AM, nielsk...@autouncle.com wrote:
> I have quite large Postgresql table, millions of records, (let's call
> it "cars"). The table holds 100+ columns of string, number, datatime
> and booleans.

That's a typical workload for ZomboDB. I have people using it far
fewer rows and columns up to the 1B-row range. And with a large enough
ES cluster you should be able to search as many rows as you can imagine.


> Around 50 of them are queried heavily (and in many different
> combinations, quite unpredictable). This means, that even though I do
> NOT require any full text capabilities (searches are only reqular
> things like "WHERE price >= 10000 AND fuel_type = 'Diesel' AND
> has_automatic_transimission IS TRUE AND created_at > '20160101'"),
> the table is super hard to index,

ZDB will let you index the handful of fields you actually want to
query. It's not well documented, but it's Issue #125 on github:
https://github.com/zombodb/zombodb/issues/125


> due to the shifting nature of the queries. Also the table is updated
> quite heavily, which also means that indexing get's heavy.

Know that while ZDB indexing is very fast (for what it is), it's not
necessarily going to improve your indexing performance. After all, it
does have to make round-trips to a remote Elasticsearch cluster.

ZDB does provide a number of options around bulk indexing batch size,
concurrency, network compression, and of course you can do all sorts of
things with Elasticsearch to improve indexing performance.


> Now for this reason I want to use Elastic for indexing those 50
> columns and searching them efficiently. I'm considering using ZomboDB
> for carrying the job of keeping the index up-to-date (I can easily
> live with a minute delay max). My questions are now:

There would never be a delay, per se. Because ZomboDB is a real
Postgres Index Access Method (akin to Postgres 'btree' or 'gin' index
types), indexing happens synchronously within the transaction
performing the UPDATE/INSERT.

Note that concurrent PG backends *will* communicate with the ES cluster
in parallel.

This is how ZDB is able to guarantee MVCC-correctness across concurret
transactions.

> Can I create a partial index of my data using zomboDB? From the
> "cars" table I only need to partially index the data based on a
> string column "availability" which can take one of two values: "sold"
> and "for sale" - I only want to index those that are "for sale"
> (postgresql supports partial indexes like "CREATE INDEX
> price_for_sale_on_cars ON cars USING btree (price) WHERE availability
> = 'for sale'").

Yes, you can. However, when you query you'd need to specify that
predicate on the actual SELECT statement otherwise Postgres won't know
to use your partial index. Using your CREATE INDEX example, the
corresponding SELECT would be:

SELECT * FROM cars WHERE zdb('cars', ctid) ==> 'green honda' AND
availability = 'for sale';

Note that the predicate for availability is NOT part of the ZDB query.

> Can I query the data using none-full-text search functionality like ?
> :
>> {
>> "query": {
>> "range" : {
>> "price" : {
>> "gte" : 10000,
>> "lte" : 20000
>> }
>> }
>> }
>> }


Yes, absolutely. The SYNTAX.md doc describes all the various query
constructs, but ranges are fully supported for both numeric and text
types.

The syntax for your specific example is:

SELECT ... WHERE zdb('cars', ctid) ==> 'price 10000 /to/ 20000';

> Is it possible to create indexes across tables? Let's say "cars" has
> many "sellers" and I wanted the column postcode from sellers to be
> indexed as well?

Well, so ZomboDB has a concept of "index linking" that lets you achieve
what you're talking about. You'd CREATE INDEX on cars and CREATE INDEX
on sellers, and then you'd describe how to "link" (think JOIN) them
together. And then ZDB is capable of transparently joining the data
behind the scenes within the ES cluster. See the INDEX-OPTIONS.md file
for more details
(https://github.com/zombodb/zombodb/blob/master/INDEX-OPTIONS.md).
This is definitely an advanced-use feature and I'd be surprised if you
didn't have follow-up questions about this.

Again, sorry for the delayed response. I'm back in action, so feel
free to reach out if you need help.

eric





Eric B. Ridge

unread,
Dec 28, 2016, 10:42:18 PM12/28/16
to zom...@googlegroups.com
On Wed, Dec 28, 2016 at 8:38 PM, Eric B. Ridge <eeb...@gmail.com> wrote:
>
>> Can I query the data using none-full-text search functionality like
>> ? :
>>> {
>>> "query": {
>>> "range" : {
>>> "price" : {
>>> "gte" : 10000,
>>> "lte" : 20000
>>> }
>>> }
>>> }
>>> }
>
>
> Yes, absolutely. The SYNTAX.md doc describes all the various query
> constructs, but ranges are fully supported for both numeric and text
> types.
>
> The syntax for your specific example is:
>
> SELECT ... WHERE zdb('cars', ctid) ==> 'price 10000 /to/ 20000';

Oops. Make that:

SELECT ... WHERE zdb('cars', ctid) ==> 'price = 10000 /to/ 20000';

eric

nielsk...@autouncle.com

unread,
Jan 5, 2017, 8:07:39 AM1/5/17
to ZomboDB
HI Eric, Thanks for all your great answers! Do you have an ETA of version 4?

Eric Ridge

unread,
Jan 6, 2017, 5:10:49 AM1/6/17
to ZomboDB
Hopefully around the middle of February.

eric
--
You received this message because you are subscribed to the Google Groups "ZomboDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zombodb+u...@googlegroups.com.
To post to this group, send email to zom...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zombodb/71633b93-fe1c-4269-b0b0-6da549595650%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Ridge

unread,
Jan 20, 2017, 7:30:14 AM1/20/17
to ZomboDB
Hi again Niels!

I have a pull request open for ES 2.4.4 support.  All of ZDB's tests pass and it appears to be working well.  https://github.com/zombodb/zombodb/pull/151

If you're interesting in trying it out, just let me know and I'll attach a set of artifacts for PG & ES.  Note that I haven't worked out upgrade/downgrade procedures yet, so you'd probably want to try this out on a separate system.

eric
Reply all
Reply to author
Forward
0 new messages