I was doing some tests with glom() and apparently spark is not partitioning data:
>>> rdd = sc.parallelize([1, 2, 3, 4], 2)
>>> rdd.glom().collect()
13/07/15 15:04:28 INFO spark.SparkContext: Starting job: collect at NativeMethodAccessorImpl.java:-2
13/07/15 15:04:28 INFO scheduler.DAGScheduler: Got job 0 (collect at NativeMethodAccessorImpl.java:-2) with 2 output partitions (allowLocal=false)
13/07/15 15:04:28 INFO scheduler.DAGScheduler: Final stage: Stage 0 (PythonRDD at NativeConstructorAccessorImpl.java:-2)
13/07/15 15:04:28 INFO scheduler.DAGScheduler: Parents of final stage: List()
13/07/15 15:04:28 INFO scheduler.DAGScheduler: Missing parents: List()
13/07/15 15:04:28 INFO scheduler.DAGScheduler: Submitting Stage 0 (PythonRDD[1] at PythonRDD at NativeConstructorAccessorImpl.java:-2), which has no missing parents
13/07/15 15:04:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[1] at PythonRDD at NativeConstructorAccessorImpl.java:-2)
13/07/15 15:04:29 INFO local.LocalScheduler: Running ResultTask(0, 0)
13/07/15 15:04:29 INFO local.LocalScheduler: Size of task 0 is 2525 bytes
13/07/15 15:04:30 INFO local.LocalScheduler: Finished ResultTask(0, 0)
13/07/15 15:04:30 INFO local.LocalScheduler: Running ResultTask(0, 1)
13/07/15 15:04:30 INFO local.LocalScheduler: Size of task 1 is 2611 bytes
13/07/15 15:04:30 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
13/07/15 15:04:30 INFO local.LocalScheduler: Finished ResultTask(0, 1)
13/07/15 15:04:30 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
13/07/15 15:04:30 INFO scheduler.DAGScheduler: Stage 0 (PythonRDD at NativeConstructorAccessorImpl.java:-2) finished in 1.671 s
13/07/15 15:04:30 INFO spark.SparkContext: Job finished: collect at NativeMethodAccessorImpl.java:-2, took 2.264232229 s
[[], [1, 2, 3, 4]]