Question about the Constraints.toEntityPredicate and Constraints.toKeyPredicate.

5 views
Skip to first unread message

RD SR

unread,
Apr 23, 2015, 12:12:42 AM4/23/15
to cdk...@cloudera.org
Hi CDK users,
  Could someone help me understand the code in the Constraints class better?
Here's my doubt.

I see that Constraints.toEntityPredicate is used to filter records. Partition level filtering is applied before at a file level when constructing the Reader. So as I see it, every record read from the reader would have already satisfied the partition predicates if any, so why apply the partition predicates again when filtering records?


Best,
R.

Ryan Blue

unread,
Apr 23, 2015, 2:28:49 PM4/23/15
to RD SR, cdk...@cloudera.org
The entity predicate applies a set of constraints to an entity, without
making assumptions about what other predicates have already been applied.

You're right that while reading we can simplify a set of predicates and
remove the ones that are satisfied by the partitioning scheme. To do
that, we have another method: minimizeFor(StorageKey). That way, you can
filter storage keys and for each one get a new set of residual
constraints. Then using that new set, you can get a predicate for the
entities in that partition.

Does that make sense?

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

RD

unread,
Apr 24, 2015, 1:12:12 AM4/24/15
to Ryan Blue, cdk...@cloudera.org
Yes it does. Thanks Ryan!
Reply all
Reply to author
Forward
0 new messages