question about RDD zip transformation

539 views
Skip to first unread message

Guodong Wang

unread,
Mar 6, 2013, 12:17:45 AM3/6/13
to spark...@googlegroups.com
Hi 

I find that the zip transformation on rdd is not what I want. My spark version is 0.7.1(which I checked out from git and built on my laptop)

For example, in the interpreter shell, do the following step
scala> val a = 1 to 26
scala> val b = 'a' to 'z'
scala> val ardd = sc.parallelize(a)
scala> val brdd = sc.parallelize(b)
scala> ardd.zip(brdd).take(10)
res3: Array[(Int, Char)] = Array((1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9), (10,10))

In my opinion, the expected result should be Array((1,'a'), (2,'b'), (3,'c'), (4,'d'), (5,'e'), (6,'f'), (7,'g'), (8,'h'), (9,'i'), (10,'j'))
I am not sure whether this is a bug.

Guodong Wang

unread,
Mar 6, 2013, 12:25:41 AM3/6/13
to spark...@googlegroups.com
 I use 
scala> ardd.zip(brdd).collect() 
and I can get the right output. Also, I save the result to disk. The result is right.

So, I guess the problem is about "take".
:) 

Josh Rosen

unread,
Mar 6, 2013, 12:48:09 AM3/6/13
to spark...@googlegroups.com


--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

王国栋

unread,
Mar 6, 2013, 1:51:31 AM3/6/13
to spark...@googlegroups.com
It seems that they are 2 different bugs. Since I modify the source and run again, my problem is still here.

Further more, I find the result of 'zip' depends on the # partitions. As the example I given above.
If the # partitions is the factor of rdd.collect().length, the result seems right. But if not, the result is wrong.

scala> val ardd = sc.parallelize(a,4)
scala> val brdd = sc.parallelize(b,4)
scala> ardd.zip(brdd).collect()
res16: Array[(Int, Char)] = Array((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,h), (8,i), (9,j), (10,k), (11,l), (12,m), (13,n), (14,o), (15,p), (16,q), (17,r), (18,s), (19,t), (20,v), (21,w), (22,x), (23,y), (24,z))

I will check the code further and try to find the buggy code.

Thanks.

--
Guodong Wang
王国栋
Reply all
Reply to author
Forward
0 new messages