Hi,
I'm using Spark to scan a huge Cassandra table (about 250M+ rows). How can I use Spark to update a single field of this table with a value selected from another table?
I'm using Java and spark-cassandra-connector.
Mahmoud,
Can you provide some more details about what you're doing. I'm having a hard time conceptualizing your code.
Are you searching through the huge table looking for a certain value and then when you find it, you're updating with another value? Or are you joining two tables and updating values on one?
Generally speaking, to update a Cassandra table via Spark, you need to have an RDD (or data frame) containing the key fields of the table plus the field you want to update. Once, you have that, you save that back to Cassandra.
Something like this:
val tbl = sc.cassandraTable("keyspace", "huge_table")
val tblfiltered = tbl.filter(item => item.getString("searchme") == "somevalue")
val tbltransformed = tblfiltered.map { item =>
val key = item.getString("keyField")
val needsUpdate = "newvalue"
( key, needsUpdate )
}
tbltransformed.saveToCassandra("keyspace", "huge_table", SomeColumns("keyfield", "needsUpdate") )
Does that help?
Jim
From: spark-connector-user@lists.datastax.com <spark-connector-user@lists.datastax.com> on behalf of Mahmoud Almokadem <prog.m...@gmail.com>
Sent: Sunday, November 6, 2016 2:45 AM
Subject: Scan and update the same Cassandra table using Spark
--Hi,
I'm using Spark to scan a huge Cassandra table (about 250M+ rows). How can I use Spark to update a single field of this table with a value selected from another table?
I'm using Java and spark-cassandra-connector.
Thanks,Mahmoud
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
Hi Mahmoud,
Here is a sample of what you could (based on your description below).
Assuming these tables.
//we can join to the country table now because we have the right keys
//we have all the data we need now in one rdd, but it's arranged in a weird format, so put it into an rdd that has the format we need (i.e., the same structure as the C* table we want to write to)
//be aware that the joinWithCassandraTable() above is an "inner join" -- if you want a left join, there is support for that in some of the later versions of the connector
//now that it's in a structure that matches our C* destination table, we can write it out to Cassandra -- we need to specify the columns so the Spark Cassandra Connector can map the data into the right places
Jim