>val df = sqlContext.read.format("org.apache.spark.sql.cassandra").options(Map("table" -> "tableName", "keyspace" -> "ks")).load()
>df.write.format("org.apache.spark.sql.cassandra").options(Map( "table" -> "tableName", "keyspace" -> "ks")).save()
Providing the error:
>java.lang.UnsupportedOperationException: 'Writing to a non-empty Cassandra Table is not allowed.'
As a work-around I figure that I'll just convert the data frame to an rdd and then perform the save command:
> val rdd = df.rdd
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[23]
> rdd.saveToCassandra("quickstats_db", "tableName", SomeColumns("sample1", "sample2"))
Providing the error
> scala.ScalaReflectionException: <none> is not a term
This is because I believe the rdd is made up of org.apache.spark.sql.Row elements instead of a sequence of tuples.
Does anyone have any nice code to take a data frame and insert/ update the related records in a non-empty cassandra table?
Thanks, Mike.
I'm coming at this from the Python side, here's
https://github.com/rustyrazorblade/spark-training/blob/master/exercises-solutions.ipynb
In my save() call, I pass "append" as the mode. This isn't specific to Cassandra, it's a Spark feature. Here's the docs on save modes: https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes
Jon
Good stuff, worked like a charm!
That is superb ipynb. I wonder what is the best way to update a subset of columns in C* via DataFrame. Do I just create a DF with selected column then save with append?
-Kai
Good one. May I know how to do a search and update/upsert using RDD?
Asvin
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
while my cassandra table schema has 26 columns out of which 9 columns are matching the df.
If I do df.write.format what options should I give to write to Cassandra DB directly. I am using append to add data to existing table only(https://groups.google.com/a/lists.datastax.com/forum/#!msg/spark-connector-user/rlGGWQF2wnM/hgq6ox04BAAJ https://github.com/datastax/spark-cassandra-connector/blob/master/doc/6_advanced_mapper.md)
I think I need to use SomeColumns but not able to use correctly.