I have an indexed table with the following schema:
CREATE TABLE test(
pk1 int,
pk2 int,
ck1 int,
ck2 int,
lucene text,
rc1 bigint,
rc2 bigint,
PRIMARY KEY ((pk1, pk2), ck1, ck2)
);
CREATE INDEX ON test(rc1);
When I run the following queries in CQL:
SELECT * FROM test WHERE rc1=1 AND rc2=1 ALLOW FILTERING;
SELECT * FROM test WHERE rc1=1 AND pk2=1 ALLOW FILTERING;
Both queries use the index for indexed column rc1 and the other not indexed column is filtered during the scan. However, if I do the same with Spark 1.6.2 using spark-cassandra-connector_2.10:1.6.2:
val rdd = sqlContext.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace" -> "test", "table" -> "test")).load()
rdd.filter("rc1=1 AND rc2=1").count
rdd.filter("rc1=1 AND pk2=1").count
Only the first query pushes down the filter over the not indexed column, even though the explain method says that all the filters are going to be pushed down:
rdd.filter("rc1=1 AND pk2=1").explain
== Physical Plan ==
Filter (pk2#1 = 1)
+- Scan ... PushedFilters: [EqualTo(rc1,1), EqualTo(pk2,1)]
In general, it seems that filters over primary key columns are never pushed down when using the Spark connector, they never arrive to Cassandra.
Is this the expected behaviour? Am I doing something wrong? Doing the in-memory filters in Spark is less efficient than doing them in Cassandra...
Thanks,
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
