val procData3 = spark.read.format("org.apache.spark.sql.cassandra").options(Map( "table" -> "process_data3", "keyspace" -> "processkeyspace")).load.cache()
Is there an option that can be passed to include WriteTime and TTL for the columns? Or do I have to do:
sc.cassandraTable[(String, Int, Double, Long, Option[Long], String)]("processkeyspace", "process_data3").select("host", "running_processes", "system_cpu_usage", WriteTime("running_processes"),TTL("running_processes"),"collection_time") and then convert the CassandraTableScanRDD to a Dataset?
If it's the latter then any tips for converting a CassandraTableScanRDD to a Dataset would be appreciated.
Thanks,
John Engstrom
I’ve asked that same question before and AFAIK it is not possible to retrieve these with the Dataset API. I filed this issue in the Jira back in February: https://datastax-oss.atlassian.net/projects/SPARKC/issues/SPARKC-528
There are a couple of ways to convert an RDD to a Dataset. I use a case class like this:
case class MyRow(field1: String, writeTime: java.sql.Date)
val rdd = sc.cassandraTable(keyspaceName, tableName)
.select("field1", WriteTime("field1", Some("field1_writetime")))
.map(row => MyRow(
row.get[String]("field1"),
new Date(TimeUnit.MILLISECONDS.convert(row.get[Long]("field1_writetime"), TimeUnit.MICROSECONDS))
)
val ds = spark.sqlContext.createDataFrame(rdd)
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.