I am trying to troubleshoot an issue with spark. I am performing a full table scan but I am seeing errors like the following:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: 10.0.2.6/10.0.2.6:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))
at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:218)
at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:284)
at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:115)
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:91)
at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:132)
... 23 more
Having dug through the connector code and the driver code I have a couple of questions.
1. It looks like the driver sets the number of connections to 1 for v3. The requests per connection is set to 1024 by default, so it looks like we should be able to run 1024 simultaneous queries to the cassandra node on our spark machine (or 256 on a remote machine). I have confirmed that it fails first on the local machine and then it also fails on a retry to the remote machine. So is this covering up another issue that I am having? Is the connection pool really being starved on the local machine? If so what can I do to prevent this?
2. I have 5 machines and have set the spark executors to 5 and the executor-cores to 1. How many queries should really be in flight at once? Is it not just paging through a result set on a single thread?
3. What other knobs do I have to be able to run a full table scan? I am not worries about impact on read performance as there is very little.
Happy to provide any other needed information.
spark.cassandra.connection.keep_alive_ms |
input.fetch.size_in_rows |
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
connection.connections_per_executor_max:
Maximum number of connections per Host set on each Executor JVM. Will be updated to DefaultParallelism / Executors for Spark Commands. Defaults to 1 if not specifying and not in a Spark Env
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md
Which likely (from the documentation, I haven't check the code yet) to be corresponding to the Cassandra driver connection setting
setMaxConnectionsPerHost()
http://docs.datastax.com/en/developer/java-driver/2.1/manual/pooling/
Which will also be helpful to resolve this kind of issues.