Hi folks,
I have a dataframe with a pretty reasonable query plan:
Filter ((group_id#128 = 2543) && (epoch#130L > 1445583600))
PhysicalRDD [...list of all my columns here]
but Cassandra nodes timeout on the read and spark workers produce exception with a scary message of the kind "select ... FROM .... WHERE token("group_id") > ? AND token("group_id") <= ?" (confused here on the token wrapping and inequality)
Is there a way to log the query that Spark connector is sending to cassandra based on this plan? Also, do Spark queries issued via the connector leave any traces in system_traces.sessions or system_traces.events? I don't see anything -- not sure if the fact that I'm passing a set of hosts to the connector vs passing a single host to cqlsh to investigate is misleading me somehow
My hope is to separate cassandra performance from connector/spark issues so I'd like to trace the exact query that is being sent down and verify in cqlsh that it can in fact return fast.