We have a use case where we are storing users on the basis of languages and each language's users are divided into 19 shards we want to perform a business operation on the users of a given language so, we stream in parallel(one shard is read by one pod) and fetch the users data.
We have one column family "a1" and has columns: expire, language, state, tenant. userId.
The tenant can value "A" / "B"
Suppose we want to process the "A" tenant users so right now we are streaming all the users from the BT and dropping all the "B" users on the code level, and dropping "A" rows if we want to process the data for "B" users.
```
const readStream = await this.bigtable.createReadStream({
decode: false,
ranges: [
{
start: startRowPointer,
end: `${shardIndex}_${language}$`,
},
],
filter: {
column: {
cellLimit: 1,
},
},
});
```
Here, are my three questions
whether filtering on the BT level vs filtering on the code level, which one will be more performant, given we are always doing entire shard read?
1. Which one is more performant, filtering on the BT level(if possible) or filtering on the code level(i.e bringing all rows via n/w and eventually dropping them via code), given we are already doing full Shard read anyways?
So, I can use the below filter
filter: [
{
condition: {
"test": [
{
column: "tenant",
},
{
value: "A"
},
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
],
},
}
]
2nd question: How can I use "LOGICAL AND" chaining let's say I want to filter users where tenant = 'A' and state = 'Delhi'? Again, is it even feasible to do this on the BT side as it's a columnar DB?
But the below is returning zero data.
filter: [
{
condition: {
"test": [
{
column: "tenant",
},
{
value: "A"
},
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
],
},
},
{
condition: {
"test": [
{
column: "state",
},
{
value: "Delhi"
},
],
"pass": [
{
"all": true
}
],
"fail": [
{
"all": false
}
],
},
},
]