I am trying to use the spark Cassandra connector on a cluster using ssl.
I created a certificate and I can access the database using cqlsh --ssl .
I can also connect using python :
ssl_opts = {'ca_certs': '/XXXXX.pem','ssl_version': PROTOCOL_TLSv1}
ap = PlainTextAuthProvider(username='XXXXXX', password='XXXXXXXXXXXXXX')
cluster = Cluster(['XXXXXXXXX'],auth_provider=ap,ssl_options=ssl_opts)
But when I tried :
MASTER=spark://XXXXXXX:7077 $SPARK_HOME/bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1 --conf spark.cassandra.connection.host=XXX.XXX.XXX.XXX --conf spark.cassandra.connection.ssl.enabled=true --conf spark.cassandra.connection.ssl.trustStore.path=/XXXX.truststore --conf spark.cassandra.connection.ssl.trustStore.password='XXXXX'
I had the error message :
Failed to open native connection to Cassandra at XXX.XXX.XXX.XXX:XXXX
I think that I do not use the options correctly (I tried to look online to find an example).
Could someone please post an example to connect to Cassandra with ssl from spark ?
Thank you,
Cheers,
Bertrand
Hi Bertrand -
We're running SSL in our production cluster. Can you provide a full stack-trace of the error?
Thank you for asking the full stack-trace of the error. I realized that I missed another error message :
java.lang.IllegalArgumentException: Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers
The full log is attached.
Thanks for your help,
Cheers,
Bertrand
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
Thanks for your prompt answer. I followed the procedure, and the error message is not there anymore.
I have now this problem :
py4j.protocol.Py4JJavaError: An error occurred while calling o31.load.
: java.io.IOException: Failed to open native connection to Cassandra at {XXXXXXX}:XXXX
Caused by: com.datastax.driver.core.exceptions.AuthenticationException: Authentication error on host /XXXXXXX:XXXXXXX: Host /XXXXXX:XXX requires authentication, but no authenticator found in Cluster configuration
Here is the command I used :
MASTER=spark://XXXXXXXXX:7077 $SPARK_HOME/bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1 --conf spark.cassandra.connection.host=XXXXXXXXXXXXXXXXX --conf spark.cassandra.connection.ssl.enabled=true --conf spark.cassandra.connection.ssl.trustStore.path=/XXXXXXX.truststore --conf spark.cassandra.connection.ssl.trustStore.password='XXXXXXX' --conf spark.cassandra.auth.username='XXXXXXXXXXXXXXXX' --conf spark.cassandra.auth.username='XXXXXXXXXXX'
Instead of providing all the passwords in the command line, is there a way to use the credentials from .cassandra/cqlshrc ?
Thank you,
Cheers,
Bertrand
Thanks for your help.
I made a mistake in my command, I had to use spark.cassandra.auth.password instead of 2 times spark.cassandra.auth.username .
After fixing this error, I can connect to the cluster and run my Cassandra query from spark.
I still have a question : rather than using the trustStore, is it possible to use a certificate ***.pem like the one I use with cqlsh --ssl or with python ssl_opts = {'ca_certs': '/XXXXXX/XXX.pem','ssl_version': PROTOCOL_TLSv1}
Thank you,
Cheers,
Bertrand
I have changed the cassandra configuration for :
client_encryption_options:
require_client_auth: from false to true
I have created 2 certificates : node0.cer.pem node0.key.pem
I can access the database with cqlsh --ssl after updating .cassandra/cqlshrc with :
[ssl]
certfile = /XXXXXXXX.pem
validate = true
# The next 2 lines must be provided when require_client_auth = true in the cassandra.yaml file
userkey = /XXXXXXX/node0.key.pem
usercert = /XXXXXXX/node0.cer.pem
But when I tried to use pyspark and access the Cassandra database :
results = sqlContext.read.format("org.apache.spark.sql.cassandra").load(table="mytable", keyspace="test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: java.io.IOException: Failed to open native connection to Cassandra at {XXXXXXXXXX}:XXXX
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /XXXXXXXXXX:XXXXXXXXXX (com.datastax.driver.core.exceptions.TransportException: [/XXXXXXXXXX] Channel has been closed))
Which options do I have to add to fix this issue ?
Thank you for your help,
Cheers,
Bertrand
I do not see any error messages in the /var/log/cassandra/system.log .
What kind of info am I looking for ?
spark and cassandra works well when I have require_client_auth:false but not when require_client_auth is set to true.
Thanks for your help.
Cheers,
Bertrand
I can connect to the database using python with :
ssl_opts = {'ca_certs': '/XXXX/XXX.pem','ssl_version': PROTOCOL_TLSv1,
'keyfile': '/XXXX.key.pem',
'certfile': '/XXXXXX.cer.pem',
}
ap = PlainTextAuthProvider(username='XXXXXX', password='XXXXXX')
cluster = Cluster(['XXXXXXX'],auth_provider=ap,ssl_options=ssl_opts)
session = cluster.connect('test')
But the connection from spark to Cassandra fails.
Here are my options in spark-defaults.conf :
spark.master spark://XXXXXX:XXXX
spark.cassandra.auth.username XXXX
spark.cassandra.auth.password XXXX
spark.cassandra.connection.ssl.trustStore.password XXXX
spark.cassandra.connection.ssl.trustStore.path /etc/cassandra/sbx.truststore
spark.cassandra.connection.ssl.enabled true
spark.cassandra.connection.host XXXX
spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1
Are there any other options I have to provide when require_client_auth is set to true ?
Does anyone have a suggestion for the connection problem ?
Thank you for your help,
Cheers,
Bertrand
> Hello everybody,
>
> I can connect to the database using python with :
>
> ssl_opts = {'ca_certs': '/XXXX/XXX.pem','ssl_version': PROTOCOL_TLSv1,
> 'keyfile': '/XXXX.key.pem',
> 'certfile': '/XXXXXX.cer.pem',
> }
> ap = PlainTextAuthProvider(username='XXXXXX', password='XXXXXX')
> cluster = Cluster(['XXXXXXX'],auth_provider=ap,ssl_options=ssl_opts)
> session = cluster.connect('test')
>
> But the connection from spark to Cassandra fails.
>
> Here are my options in spark-defaults.conf :
>
> spark.master spark://XXXXXX:XXXX
> spark.cassandra.auth.username XXXX
> spark.cassandra.auth.password XXXX
> spark.cassandra.connection.ssl.trustStore.password XXXX
> spark.cassandra.connection.ssl.trustStore.path XXXX
> spark.cassandra.connection.ssl.enabled true
> spark.cassandra.connection.host XXXX
> spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1
>
>
> Are there any other options to provide when require_client_auth is set to true ?
>>> results = sqlContext.read.format("org.apache.spark.sql.cassandra").load(table="mytable", keyspace="test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: java.io.IOException: Failed to open native connection to Cassandra at {XXXXXX}:XXXXXX
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:212)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /XXXXXX:XXXXXX (com.datastax.driver.core.exceptions.TransportException: [/XXXXXX] Channel has been closed))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1382)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)
... 22 more
Thanks for your help,
Cheers,
Bertrand
I should probably have asked this question first : is 2-way SSL encryption supported by the Spark-Cassandra connector ?
Thank you,
Cheers,
Bertrand
Cheers,
Bertrand