How to turn on Query logging

1,455 views
Skip to first unread message

Yana Kadiyska

unread,
Oct 28, 2015, 6:11:43 PM10/28/15
to spark-conn...@lists.datastax.com
Hi folks, 

I have a dataframe with a pretty reasonable query plan:

Filter ((group_id#128 = 2543) && (epoch#130L > 1445583600))
         PhysicalRDD [...list of all my columns here]

but Cassandra nodes timeout on the read and spark workers produce exception with a scary message of the kind "select ... FROM .... WHERE token("group_id") > ? AND token("group_id") <= ?" (confused here on the token wrapping and inequality)

Is there a way to log the query that Spark connector is sending to cassandra based on this plan? Also, do Spark queries issued via the connector leave any traces in system_traces.sessions or system_traces.events? I don't see anything -- not sure if the fact that I'm passing a set of hosts to the connector vs passing a single host to cqlsh to investigate is misleading me somehow

My hope is to separate cassandra performance from connector/spark issues so I'd like to trace the exact query that is being sent down and verify in cqlsh that it can in fact return fast.

Russell Spitzer

unread,
Oct 28, 2015, 6:31:42 PM10/28/15
to spark-conn...@lists.datastax.com
Turn on debug logging and lines 
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraTableScanRDD.scala#L197-L200

Will tell you what cql is executed. 

These queries will appear in system_traces and events BUT only if you turn query tracing on. It is off by default but you could always turn on probabilisitic query tracing and get the results from some of the queries. 

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--

Yana Kadiyska

unread,
Oct 29, 2015, 11:47:58 AM10/29/15
to spark-conn...@lists.datastax.com
Thanks Russel, can you elaborate a little more on two points:

1) I've set log4j.logger.com.datastax.spark.connector=TRACE in the driver's log4j properties files. I see lines https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraTableScanRDD.scala#L146-L147 show up but not L197-L200. Do I need to modify the log4j files used by the spark workers to see the lines you mention?

2)  how can I turn query tracing on (sorry, this is more of a general Cassandra question I guess). I know how to do it in cqlsh for the duration of the session, but in this case I want to turn it on either serverside for all sessions (would be OK short term) or from the spark-cassandra driver

Thanks a lot for your help

Russell Spitzer

unread,
Oct 29, 2015, 1:52:10 PM10/29/15
to spark-conn...@lists.datastax.com
1) The lines i mentioned should appear on the driver's log. And not on the executors (or workers). If you don't see those log lines perhaps the execution is not getting that far?

2) You can only set query tracing on on the server side for all incoming requests. http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html. I don't believe there is currently a way to do this from the connector at the moment.

Alexey Ponkin

unread,
Nov 2, 2015, 7:06:37 AM11/2/15
to DataStax Spark Connector for Apache Cassandra
четверг, 29 октября 2015 г., 20:52:10 UTC+3 пользователь Russell Spitzer написал:
Hi Russel,
So to enable connector logging we need to create log4j.properties in spark job resources directory and set I've set log4j.logger.com.datastax.spark.connector=TRACE ? Is it correct?

Russell Spitzer

unread,
Nov 2, 2015, 11:17:06 AM11/2/15
to DataStax Spark Connector for Apache Cassandra
Any method you like, you just need your log4j properties file set for the running application.
Reply all
Reply to author
Forward
0 new messages