If there is no Partitioner or the RDD's have different partitioners Union basically just makes a giant mishmash of all your tasks.
https://github.com/apache/spark/blob/v2.0.2/core/src/main/scala/org/apache/spark/SparkContext.scala#L1218-L1226 . This means you now have tons of Spark partitions, some of which have data for the same C* partition key. For example Spark Partition A can have entries for Cassandra Key A but Spark Partition B can
ALSO have entires for Cassandra Key A. Since spanBy requires all of the values of A to be contiguous you get odd behavior, i'm assuming multiple groups for the same partition key.
Using the sortBy regroups together your partitions based on Cassandra partition key this is a full shuffle. RepartitionByCassandraReplica is yet another shuffle which only sorts based on InetAddress and not PK value.