JoinwithCassandra is an inner join (it only returns when both sides of the join have a match) and not a Cartesian join (row are joined with every row in the other table). What are you trying to do?
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
| /** | |
| * Uses the data from [[org.apache.spark.rdd.RDD RDD]] to join with a Cassandra table without | |
| * retrieving the entire table. | |
| * Any RDD which can be used to saveToCassandra can be used to joinWithCassandra as well as any | |
| * RDD which only specifies the partition Key of a Cassandra Table. This method executes single | |
| * partition requests against the Cassandra Table and accepts the functional modifiers that a | |
| * normal [[com.datastax.spark.connector.rdd.CassandraTableScanRDD]] takes. | |
| * | |
| * By default this method only uses the Partition Key for joining but any combination of columns | |
| * which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter | |
| * or the on() method. | |
| * | |
| * Example With Prior Repartitioning: {{{ | |
| * val source = sc.parallelize(keys).map(x => new KVRow(x)) | |
| * val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10) | |
| * val someCass = repart.joinWithCassandraTable(keyspace, tableName) | |
| * }}} | |
| * | |
| * Example Joining on Clustering Columns: {{{ | |
| * val source = sc.parallelize(keys).map(x => (x, x * 100)) | |
| * val someCass = source.joinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group")) | |
| * }}} | |
| **/ |