Updating an existing non-empty table from a data frame

Mike Trienis

unread,

Jul 6, 2015, 5:22:36 PM7/6/15

to spark-conn...@lists.datastax.com

Okay, I understand that I am not able to update a non-empty table in Cassandra based on:

>val df = sqlContext.read.format("org.apache.spark.sql.cassandra").options(Map("table" -> "tableName", "keyspace" -> "ks")).load()
>df.write.format("org.apache.spark.sql.cassandra").options(Map( "table" -> "tableName", "keyspace" -> "ks")).save()

Providing the error:

>java.lang.UnsupportedOperationException: 'Writing to a non-empty Cassandra Table is not allowed.'

As a work-around I figure that I'll just convert the data frame to an rdd and then perform the save command:

> val rdd = df.rdd
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[23]
> rdd.saveToCassandra("quickstats_db", "tableName", SomeColumns("sample1", "sample2"))

Providing the error

> scala.ScalaReflectionException: <none> is not a term

This is because I believe the rdd is made up of org.apache.spark.sql.Row elements instead of a sequence of tuples.

Does anyone have any nice code to take a data frame and insert/ update the related records in a non-empty cassandra table?

Thanks, Mike.

Jon Haddad

unread,

Jul 7, 2015, 1:49:33 PM7/7/15

to spark-conn...@lists.datastax.com

Hey Mike,

I'm coming at this from the Python side, here's

https://github.com/rustyrazorblade/spark-training/blob/master/exercises-solutions.ipynb

In my save() call, I pass "append" as the mode. This isn't specific to Cassandra, it's a Spark feature. Here's the docs on save modes: https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes

Jon

Mike Trienis

unread,

Jul 7, 2015, 2:28:36 PM7/7/15

to spark-conn...@lists.datastax.com

Good stuff, worked like a charm!

Kai Wang

unread,

Oct 22, 2015, 8:34:23 AM10/22/15

to DataStax Spark Connector for Apache Cassandra

Jon,

That is superb ipynb. I wonder what is the best way to update a subset of columns in C* via DataFrame. Do I just create a DF with selected column then save with append?

-Kai

Jonathan Haddad

unread,

Oct 22, 2015, 12:06:24 PM10/22/15

to DataStax Spark Connector for Apache Cassandra

Yep

Ashwin Kumar

unread,

Apr 6, 2016, 1:50:06 PM4/6/16

to DataStax Spark Connector for Apache Cassandra, mike.t...@orcsol.com

Good one. May I know how to do a search and update/upsert using RDD?

Asvin

Russell Spitzer

unread,

Apr 6, 2016, 2:17:43 PM4/6/16

to DataStax Spark Connector for Apache Cassandra, mike.t...@orcsol.com

If you set the write mode to append it will upsert
http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

Pranab Chaudhuri

unread,

Dec 27, 2017, 12:56:32 AM12/27/17

to DataStax Spark Connector for Apache Cassandra, mike.t...@orcsol.com

Hi All,
I have loaded data from oracle into spark dataframe and dropped unneeded tables to get data frame of 9 columns. My spark data frame schme is:

while my cassandra table schema has 26 columns out of which 9 columns are matching the df.

If I do df.write.format what options should I give to write to Cassandra DB directly. I am using append to add data to existing table only(https://groups.google.com/a/lists.datastax.com/forum/#!msg/spark-connector-user/rlGGWQF2wnM/hgq6ox04BAAJ https://github.com/datastax/spark-cassandra-connector/blob/master/doc/6_advanced_mapper.md)

I think I need to use SomeColumns but not able to use correctly.

Reply all

Reply to author

Forward