Help with chain filters

353 views
Skip to first unread message

i...@redivis.com

unread,
Jul 26, 2016, 1:10:15 PM7/26/16
to Google Cloud Bigtable Discuss
Hi all,

I'm having a bit of trouble setting up more complex filters to query Bigtable, and am a little unsure of what is and isn't possible, and whether this is just a syntax issue or if I need to rethink our data access.

As an example, let's say I have a table of patient medical records, and I want to filter out all rows where the patient is 65 or older AND female. I'm using the gcloud-node API; I haven't been able to find a documented example of something like this, but my naive approach looks like this:

[
{
"condition": {
"test": [
{
"condition": {
"test": [
{
"column": "SEX",
"value": "F"
}
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": true
}
]
}
},
{
"condition": {
"test": [
{
"column": "AGE",
"value": {
"start": 65
}
}
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
]
}
}
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
]
}
}
]

This may be well wide of the mark, just couldn't find any guidance in the docs. This filter is currently returning an empty set.

Many thanks in advance!

Douglas Mcerlean

unread,
Jul 27, 2016, 3:04:41 PM7/27/16
to Google Cloud Bigtable Discuss
I'm familiar with the underlying proto API, not the idiomatic Node wrapper, but I see several problems with this filter:

1) Your first nested condition is a no-op, as both the true and false cases have "all" set to true.
2) Your second nested test is probably fine in the common case, but if you're storing ages > 100 as strings, they will be < "65" and this won't do what you want. Your values should be fixed-width 3 to make this work for all possible values.
3) I'm not certain whether arrays in the Node API mean pipe the filters through each other or interleave all the results. But if it's the latter, the two nested conditions will produce an item for any row where (age >= 65 OR female), not (age >=65 AND female)
4) The cases on your outer condition are inverted. The outer "test" filter is intended (if the above are fixed) to produce items for rows where (age >= 65 AND female), and you've stated that you want return *all but* those rows.
5) For performance reasons, I'd recommend that you don't use "all" in your pass cases for the inner conditions, but instead just return the two columns that you care about testing. Better yet, if you can, don't make them conditions at all (conditions are very slow). Instead, interleave the two "test" filters and pipe them into whatever Node's equivalent of "cells_per_row_offset_filter(1)" is, so that an item is only returned to the outer condition if both tests return something.
6) If you want a condition case to return nothing, I believe you can just omit it (unless Node changed this from the proto API).

Hoep this helps. For my reference, can you also point me towards the documentation you were looking at?

Kevin Si

unread,
Jul 27, 2016, 4:41:55 PM7/27/16
to Google Cloud Bigtable Discuss

I think what is: "Is male OR age < 65"

[
{
"condition": {
"test": [
{
"condition": {
"test": [
{
"column": "SEX",
"value": "M"
}
],
}
},
{
"condition": {
"test": [
{
"column": "AGE",
"value": {
"end": 65
}
}
],
}
}
],
"pass": [
{
"all": false  # MUST_PASS_ONE
}
]
}
}
]



On Tuesday, July 26, 2016 at 1:10:15 PM UTC-4, i...@redivis.com wrote:

Douglas Mcerlean

unread,
Jul 29, 2016, 5:10:32 PM7/29/16
to Google Cloud Bigtable Discuss
On Wed, Jul 27, 2016 at 1:49 PM, 'Kevin Si' via Google Cloud Bigtable Discuss <google-cloud-b...@googlegroups.com> wrote:

I think what is: "Is male OR age < 65"

[
{
"condition": {
"test": [
{
"condition": {
 
 
This condition doesn't produce any output ever (and in fact probably will be rejected), because it has neither a "pass" nor a "fail."

"test": [
{
"column": "SEX",
"value": "M"
}
],
}
},
{
"condition": {
 
Neither does this one .

"test": [
{
"column": "AGE",
"value": {
"end": 65
}
}
],
}
}
],
"pass": [

This one won't ever produce output either, the "fail" case is omitted and the "pass" case explicitly outputs nothing.
 

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/eab528a4-4add-43b5-a9c3-b160a2839585%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

i...@redivis.com

unread,
Jul 31, 2016, 9:40:05 PM7/31/16
to Google Cloud Bigtable Discuss
Douglas (and Kevin), thank you so much for the response — and apologies for the slow reply, I've been out of town the last few days.

I'm still struggling to generate any functional chained filter — I just tried the simplest case and couldn't return any rows — I think I have some fundamental misunderstanding of how the Node wrapper is meant to work. You can find the API of reference here.

[
{
"condition": {
"test": [
{
"condition": {
"test": [
{
"all": true
}
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
]
}
},
{
"condition": {
"test": [
{
"all": true
}
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
]
}
}
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
]
}
}
]

To address a couple of your points:

1) Whoops, typo when posting.
2) I *think* I'm storing ages as numbers, as this is what I'm piping into / receiving from the API. I understand that ultimately everything is stored / ordered as byte arrays in BigTable, and it seems that numeric types are converted as such. Could be wrong.
3) There's a separate 'interleave' option for filters, so I'm assuming that this is the former. The docs are a little unclear here, I'm not sure that I really know.
4) I'm sorry, this was me being unclear. I actually wanted to receive only those rows that are older than 65 and female. But really would be happy receiving anything that's not an empty set at this point (FWIW, there definitely are rows in the table that meet these conditions).
5) Thanks, yes, need to think through performance here. This is more of a first pass to get something working as a proof-of-concept.
6) According to the docs, this is true, but I think this is a bug with the Node wrapper (I just posted an issue to the repo)

Douglas Mcerlean

unread,
Aug 1, 2016, 10:36:55 AM8/1/16
to Google Cloud Bigtable Discuss
Responses inline

I doubt it. The docs refer only to strings, and JS has a nasty reputation for implicit type conversions. Run an "all" filter over the data you've put in and see what you get back.

3) There's a separate 'interleave' option for filters, so I'm assuming that this is the former. The docs are a little unclear here, I'm not sure that I really know.

After reading them, looks like it's a pipe.
 
4) I'm sorry, this was me being unclear. I actually wanted to receive only those rows that are older than 65 and female. But really would be happy receiving anything that's not an empty set at this point (FWIW, there definitely are rows in the table that meet these conditions).
5) Thanks, yes, need to think through performance here. This is more of a first pass to get something working as a proof-of-concept.
6) According to the docs, this is true, but I think this is a bug with the Node wrapper (I just posted an issue to the repo)

Here's my suggested filter (assuming you can omit arrays where you don't want a pipe):

{
  "condition": {
    "test": [
      {
        "interleave": [
          // Returns a cell in rows where SEX == F
          [
            {
              "column": "SEX",
              "cellLimit": 1
            },
            {"value": "F"}     // I'm pretty sure "column" and "value" need to be separate, they aren't listed together in the docs
          ],
          // Returns a cell in rows where AGE > 65
          [
            {
              "column": "AGE",
              "cellLimit": 1
            },
            {
              "value": {
                "start": "065"
                "inclusive": false
              }
            }
          ],
        ]
      },
      // Blocks the first returned cell, so that the test only passes if BOTH interleaved filters return something
      {
        "row": {
          "cellOffset": 1
        }
      }
    ],
    "pass": {"all": true},
    "fail": {"all": false}
  }
}

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

i...@redivis.com

unread,
Aug 1, 2016, 3:00:07 PM8/1/16
to Google Cloud Bigtable Discuss
Success!! Thanks you so much Douglas.

A couple addenda to the filter you provided:

1) The "cellLimit" attribute needs to be within the column filter, so should look like:

column: {
    name
: "AGE",
    cellLimit
: 1
}

2) Passing numeric values as strings is explicitly not working for my case — I do think that numeric values are being converted appropriately to byte arrays. The code that is handling this (I think) can be found here.

Thanks again — might be good to get some of these more complicated examples in the gcloud-node docs... all in good time.

Ian
Responses inline

To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtable-discuss+unsub...@googlegroups.com.

To post to this group, send email to google-cloud-b...@googlegroups.com.

Douglas Mcerlean

unread,
Aug 1, 2016, 3:19:04 PM8/1/16
to Google Cloud Bigtable Discuss
Oh neat! I didn't see that numeric conversion documented anywhere on the filters, but it seems quite helpful.

We should certainly at least document the tricks for AND and OR. As you saw here, AND can be done by piping an Interleave through a row: cellOffset of (#conditions - 1). A simple OR needs no enhancement, but to nest it within an AND you'd need to pipe it through a row: cellLimit of 1.

Responses inline

To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.

To post to this group, send email to google-cloud-b...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages