RDD.flatMap being inconsistent with the collections API is an issue here. See: http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3CCAKn3j0s7paRiWVjjweEYGHL...@mail.gmail.com%3E