Spark cassandra connector with ssl

1,188 views
Skip to first unread message

Bertrand Brelier

unread,
Jan 22, 2016, 1:43:15 PM1/22/16
to DataStax Spark Connector for Apache Cassandra
Hello everybody,

I am trying to use the spark Cassandra connector on a cluster using ssl.

I created a certificate and I can access the database using cqlsh --ssl .

I can also connect using python :
ssl_opts = {'ca_certs': '/XXXXX.pem','ssl_version': PROTOCOL_TLSv1}
ap = PlainTextAuthProvider(username='XXXXXX', password='XXXXXXXXXXXXXX')
cluster = Cluster(['XXXXXXXXX'],auth_provider=ap,ssl_options=ssl_opts)


But when I tried :

MASTER=spark://XXXXXXX:7077 $SPARK_HOME/bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1 --conf spark.cassandra.connection.host=XXX.XXX.XXX.XXX --conf spark.cassandra.connection.ssl.enabled=true --conf spark.cassandra.connection.ssl.trustStore.path=/XXXX.truststore --conf spark.cassandra.connection.ssl.trustStore.password='XXXXX'

I had the error message :

Failed to open native connection to Cassandra at XXX.XXX.XXX.XXX:XXXX

I think that I do not use the options correctly (I tried to look online to find an example).

Could someone please post an example to connect to Cassandra with ssl from spark ?

Thank you,

Cheers,

Bertrand

Eric Meisel

unread,
Jan 25, 2016, 11:46:04 AM1/25/16
to DataStax Spark Connector for Apache Cassandra

Hi Bertrand -

We're running SSL in our production cluster. Can you provide a full stack-trace of the error?

Bertrand Brelier

unread,
Jan 25, 2016, 12:09:39 PM1/25/16
to DataStax Spark Connector for Apache Cassandra
Hello Eric,

Thank you for asking the full stack-trace of the error. I realized that I missed another error message :

java.lang.IllegalArgumentException: Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers

The full log is attached.

Thanks for your help,

Cheers,

Bertrand

log

Eric Meisel

unread,
Jan 25, 2016, 12:11:13 PM1/25/16
to DataStax Spark Connector for Apache Cassandra
Ah, yes, I've run into that before. What we did to get around that was to add the Java Cryptography Extension Jar files to the C* Node(s)' path. Check out the below article for more information:

https://support.datastax.com/hc/en-us/articles/204226129-Receiving-error-Caused-by-java-lang-IllegalArgumentException-Cannot-support-TLS-RSA-WITH-AES-256-CBC-SHA-with-currently-installed-providers-on-DSE-startup-after-setting-up-client-to-node-encryption



--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Bertrand Brelier

unread,
Jan 25, 2016, 1:48:52 PM1/25/16
to DataStax Spark Connector for Apache Cassandra
Hello Eric,

Thanks for your prompt answer. I followed the procedure, and the error message is not there anymore.

I have now this problem :

py4j.protocol.Py4JJavaError: An error occurred while calling o31.load.
: java.io.IOException: Failed to open native connection to Cassandra at {XXXXXXX}:XXXX
Caused by: com.datastax.driver.core.exceptions.AuthenticationException: Authentication error on host /XXXXXXX:XXXXXXX: Host /XXXXXX:XXX requires authentication, but no authenticator found in Cluster configuration


Here is the command I used :
MASTER=spark://XXXXXXXXX:7077 $SPARK_HOME/bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1 --conf spark.cassandra.connection.host=XXXXXXXXXXXXXXXXX --conf spark.cassandra.connection.ssl.enabled=true --conf spark.cassandra.connection.ssl.trustStore.path=/XXXXXXX.truststore --conf spark.cassandra.connection.ssl.trustStore.password='XXXXXXX' --conf spark.cassandra.auth.username='XXXXXXXXXXXXXXXX' --conf spark.cassandra.auth.username='XXXXXXXXXXX'


Instead of providing all the passwords in the command line, is there a way to use the credentials from .cassandra/cqlshrc ?

Thank you,

Cheers,

Bertrand

Eric Meisel

unread,
Jan 25, 2016, 1:53:37 PM1/25/16
to DataStax Spark Connector for Apache Cassandra
Hey Bertrand,

I've personally not used authentication so I don't know if using that file is particularly possible.

Something you can do is set up a default spark configuration (spark-defaults.conf). The values being supplied at the command line can be defaulted to that configuration file instead. 

Bertrand Brelier

unread,
Jan 25, 2016, 2:55:58 PM1/25/16
to DataStax Spark Connector for Apache Cassandra
Hello Eric,

Thanks for your help.

I made a mistake in my command, I had to use spark.cassandra.auth.password instead of 2 times spark.cassandra.auth.username .

After fixing this error, I can connect to the cluster and run my Cassandra query from spark.

I still have a question : rather than using the trustStore, is it possible to use a certificate ***.pem like the one I use with cqlsh --ssl or with python ssl_opts = {'ca_certs': '/XXXXXX/XXX.pem','ssl_version': PROTOCOL_TLSv1}

Thank you,

Cheers,

Bertrand

Eric Meisel

unread,
Jan 25, 2016, 4:05:59 PM1/25/16
to DataStax Spark Connector for Apache Cassandra
I've only ever used this with the truststore/keystore myself. I don't believe that's possible.

Bertrand Brelier

unread,
Jan 28, 2016, 9:13:26 AM1/28/16
to DataStax Spark Connector for Apache Cassandra
Hello everybody,

I have changed the cassandra configuration for :
client_encryption_options:
require_client_auth: from false to true


I have created 2 certificates : node0.cer.pem node0.key.pem

I can access the database with cqlsh --ssl after updating .cassandra/cqlshrc with :

[ssl]
certfile = /XXXXXXXX.pem
validate = true
# The next 2 lines must be provided when require_client_auth = true in the cassandra.yaml file
userkey = /XXXXXXX/node0.key.pem
usercert = /XXXXXXX/node0.cer.pem


But when I tried to use pyspark and access the Cassandra database :

results = sqlContext.read.format("org.apache.spark.sql.cassandra").load(table="mytable", keyspace="test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: java.io.IOException: Failed to open native connection to Cassandra at {XXXXXXXXXX}:XXXX
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /XXXXXXXXXX:XXXXXXXXXX (com.datastax.driver.core.exceptions.TransportException: [/XXXXXXXXXX] Channel has been closed))


Which options do I have to add to fix this issue ?

Thank you for your help,

Cheers,

Bertrand

Noorul Islam Kamal Malmiyoda

unread,
Jan 28, 2016, 9:24:19 AM1/28/16
to spark-conn...@lists.datastax.com
There should be some info in cassandra log (/var/log/cassandra/system.log)

Thanks and Regards
Noorul

Bertrand Brelier

unread,
Jan 28, 2016, 9:32:23 AM1/28/16
to DataStax Spark Connector for Apache Cassandra
Hello Noorul,

I do not see any error messages in the /var/log/cassandra/system.log .

What kind of info am I looking for ?

spark and cassandra works well when I have require_client_auth:false but not when require_client_auth is set to true.

Thanks for your help.

Cheers,

Bertrand

Bertrand Brelier

unread,
Jan 29, 2016, 8:18:41 AM1/29/16
to DataStax Spark Connector for Apache Cassandra
Hello everybody,

I can connect to the database using python with :

ssl_opts = {'ca_certs': '/XXXX/XXX.pem','ssl_version': PROTOCOL_TLSv1,
'keyfile': '/XXXX.key.pem',
'certfile': '/XXXXXX.cer.pem',
}
ap = PlainTextAuthProvider(username='XXXXXX', password='XXXXXX')
cluster = Cluster(['XXXXXXX'],auth_provider=ap,ssl_options=ssl_opts)
session = cluster.connect('test')

But the connection from spark to Cassandra fails.

Here are my options in spark-defaults.conf :

spark.master spark://XXXXXX:XXXX
spark.cassandra.auth.username XXXX
spark.cassandra.auth.password XXXX
spark.cassandra.connection.ssl.trustStore.password XXXX
spark.cassandra.connection.ssl.trustStore.path /etc/cassandra/sbx.truststore
spark.cassandra.connection.ssl.enabled true
spark.cassandra.connection.host XXXX
spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1


Are there any other options I have to provide when require_client_auth is set to true ?

Bertrand Brelier

unread,
Feb 2, 2016, 11:20:06 AM2/2/16
to DataStax Spark Connector for Apache Cassandra
Hello everybody,

Does anyone have a suggestion for the connection problem ?

Thank you for your help,

Cheers,

Bertrand

> Hello everybody,


>
> I can connect to the database using python with :
>
> ssl_opts = {'ca_certs': '/XXXX/XXX.pem','ssl_version': PROTOCOL_TLSv1,
> 'keyfile': '/XXXX.key.pem',
> 'certfile': '/XXXXXX.cer.pem',
> }
> ap = PlainTextAuthProvider(username='XXXXXX', password='XXXXXX')
> cluster = Cluster(['XXXXXXX'],auth_provider=ap,ssl_options=ssl_opts)
> session = cluster.connect('test')
>
> But the connection from spark to Cassandra fails.
>
> Here are my options in spark-defaults.conf :
>
> spark.master spark://XXXXXX:XXXX
> spark.cassandra.auth.username XXXX
> spark.cassandra.auth.password XXXX
> spark.cassandra.connection.ssl.trustStore.password XXXX

> spark.cassandra.connection.ssl.trustStore.path XXXX


> spark.cassandra.connection.ssl.enabled true
> spark.cassandra.connection.host XXXX
> spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-RC1
>
>

> Are there any other options to provide when require_client_auth is set to true ?

Noorul Islam K M

unread,
Feb 2, 2016, 11:35:18 AM2/2/16
to Bertrand Brelier, DataStax Spark Connector for Apache Cassandra

What exactly is the error you are getting?

Regards,
Noorul

Bertrand Brelier

unread,
Feb 2, 2016, 11:41:00 AM2/2/16
to DataStax Spark Connector for Apache Cassandra, bertrand...@gmail.com
Hello Noorul,

>>> results = sqlContext.read.format("org.apache.spark.sql.cassandra").load(table="mytable", keyspace="test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: java.io.IOException: Failed to open native connection to Cassandra at {XXXXXX}:XXXXXX
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:212)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /XXXXXX:XXXXXX (com.datastax.driver.core.exceptions.TransportException: [/XXXXXX] Channel has been closed))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1382)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)
... 22 more

Thanks for your help,

Cheers,

Bertrand

Bertrand Brelier

unread,
Feb 2, 2016, 2:07:27 PM2/2/16
to DataStax Spark Connector for Apache Cassandra, bertrand...@gmail.com
Hello everybody,

I should probably have asked this question first : is 2-way SSL encryption supported by the Spark-Cassandra connector ?

Thank you,

Cheers,

Bertrand

eugene miretsky

unread,
Feb 2, 2016, 3:53:59 PM2/2/16
to spark-conn...@lists.datastax.com, bertrand...@gmail.com
The DataStax Enterprise  version does: https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkPwdAppl.html

Not sure about the regular open source driver. 


Cheers,

Bertrand

Reply all
Reply to author
Forward
0 new messages