Fwd: Running streaming on a cluster

24 views
Skip to first unread message

Stephan Ewen

unread,
Apr 1, 2014, 9:16:55 AM4/1/14
to stratosp...@googlegroups.com
Hi!

I am bouncing this answer to the mailing list, because I think it might be interesting also to other people.

The exception you get is from the RPC service. The RPC is currently rather bad at giving good error messages in the exceptions. But I think if you look into the logs, you may find a more descriptive error message.

I am suspecting quite a bit that this is actually a class loader problem. If things work locally and not distributed, that is the prime problem source. When a data type that the RPC wants to transport cannot be found, the RPC calls fail. To verify that, you can take the jar with your code, place it in the "lib" directory, and try it again. If that solves the problem, then it is a classloader issue.


Here is how the class loaders currently work in stratosphere (there are two different class loaders)

  1) The system class loader is used to start the jobmanager and taskmanagers. It has access to all classes that are in jar files in the "lib" directory. The jars in "client_lib" are only added to the classpath of the CLI client and the webclient.

  2) There is a "userCodeClassLoader" for each job, which has the jar files that are part of the user program (the submitted jar and nested jars). In your case (because you are going to the low level API directly, it has the jars attached to the jobgraph. The parent of the user code class loader is the system class loader, so that one should be able to resolve all classes that you ever access.


When going through the high level Java/Scala/Graph APIs, we ensure that the user code class loader is used whenever user defined (job specific) classes are used. Because you are useing the lower level API, you need to take care of that yourself.

The call to get the classLoader is "LibraryCacheManager.getClassLoader(jobid)". You can get the JobID through "getEnvironment().getJobId()". (It looks like we should add a shortcut to the class loader through the environment).

Both RPC and events have no access to the user code class loader right now. The reason is that we never intended to expose those interfaces. The part you are developing is also actually not user code, but part of a system functionality (the streaming module). As such I would make sense if you put your classes into the lib folder anyways. Then your classes can be used as types for events and RPC parameters.

Greetings,
Stephan



On Tue, Apr 1, 2014 at 12:18 PM, Márton Balassi <balassi...@gmail.com> wrote:
Hi guys,

We're trying to experiment with bringing streaming to a test cluster. The code we're trying to run is here, which also packed into a jar and is distributed to the lib folder of the taskmanagers, however we get the following error:

java.io.IOException: Call to hadoop02.ilab.sztaki.hu/10.1.11.2:6123 failed on local exception: java.io.EOFException
at eu.stratosphere.nephele.ipc.Client.wrapException(Client.java:736)
at eu.stratosphere.nephele.ipc.Client.call(Client.java:705)
at eu.stratosphere.nephele.ipc.RPC$Invoker.invoke(RPC.java:249)
at com.sun.proxy.$Proxy0.submitJob(Unknown Source)
at eu.stratosphere.nephele.client.JobClient.submitJobAndWait(JobClient.java:258)
at eu.stratosphere.streaming.test.wordcount.WordCountCluster.main(WordCountCluster.java:83)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at eu.stratosphere.nephele.ipc.Client$Connection.receiveResponse(Client.java:497)
at eu.stratosphere.nephele.ipc.Client$Connection.run(Client.java:443)

Stefan yesterday mentoined something on classloaders but we didn't manage to figure it out yet.

Do you have any suggestions?


José Luis López Pino

unread,
Jun 22, 2014, 6:39:20 AM6/22/14
to stratosp...@googlegroups.com
Hi,

I was experiencing the same problem until I realised that I didn't upload the same version of stratosphere to all the nodes, that's why the RPC services was throwing this exception.

PS: I know new messages should go in the new apache-flink maillist and this thread is old, but I would like to state this in case someone is hitting his head against the same problem :)

Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement,
Pino


--
You received this message because you are subscribed to the Google Groups "stratosphere-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stratosphere-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/stratosphere-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages