error loading classes in app jar

Andrew Xue

unread,

Apr 28, 2012, 6:58:11 PM4/28/12

to cascalog-user

not sure how much this has to do with cascalog per se ... but i have
this really confounding issue and maybe someone can help? so i have
this job which is failing, the stack trace in the job logs look like

Caused by: java\.lang\.RuntimeException: java\.lang
\.ClassNotFoundException: views\.visit-facts, compiling:(views/
visit_facts\.clj:1)
at clojure\.lang\.Compiler\.analyze(Compiler\.java:6235)
at clojure\.lang\.Compiler\.analyze(Compiler\.java:6177)
...
Caused by: java\.lang\.RuntimeException: java\.lang
\.ClassNotFoundException: views\.visit-facts
at clojure\.lang\.Util\.runtimeException(Util\.java:165)
at clojure\.lang\.RT\.classForName(RT\.java:2017)
...
Caused by: java\.lang\.ClassNotFoundException: views\.visit-facts
at java\.net\.URLClassLoader$1\.run(URLClassLoader\.java:202)
...

I sort of suspect that the job jar was not being replicated
correctly .. and looking at daemon logs i see that the namenode has
errors replicating jobtracker.info

INFO org.apache.hadoop.ipc.Server (IPC Server handler 6 on 9000): IPC
Server handler 6 on 9000, call addBlock(/mnt/var/lib/hadoop/tmp/mapred/
system/jobtracker.info, DFSClient_1731950709) from
10.194.15.165:51308: error: java.io.IOException: File /mnt/var/lib/
hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0
nodes, instead of 1

on the datanode side I see errors w/ receiving the job jar

namenode logs says:

2012-04-28 19:05:55,582 INFO org.apache.hadoop.hdfs.StateChange (IPC
Server handler 11 on 9000): DIR* NameSystem.completeFile: file /mnt/
var/lib/hadoop/tmp/mapred/system/job_201204281904_0001/job.jar is
closed by DFSClient_-387163361

datanode logs says:

INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace
(PacketResponder 0 for Block blk_832986479003239919_1004): src: /
10.195.89.124:53001, dest: /10.76.91.41:9200, bytes: 39825229, op:
HDFS_WRITE, cliID: DFSClient_-387163361, srvID:
DS-304531098-10.76.91.41-9200-1335639918679, blockid:
blk_832986479003239919_1004q

WARN org.apache.hadoop.hdfs.server.datanode.DataNode
(org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@1a5d08):
DatanodeRegistration(10.76.91.41:9200,
storageID=DS-304531098-10.76.91.41-9200-1335639918679, infoPort=9102,
ipcPort=9201):Failed to transfer blk_138586677137070325_1009 to
10.37.67.149:9200 got java.net.SocketException: Original Exception :
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)

my first gut reaction was, maybe the jar is too big and too hard to
replicate? however the oddest part of all this is that

(1) other scripts in this jar work fine. i don't see the replication
problems with jobtracker.info or the job.jar
(2) portions of the visit-facts script work fine as well -- like the
subqueries it depends on run w/ out the above issues

so it seems to suggest that something specific to this script is
affecting how hadoop is replicating its jobtracker.info and job.jar --
which does not make a whole lot of sense to me.

i am running this on AWS EMR -- get the same problem for hadoop vs.
0.20 and 0.20.205

any insight or guesses welcome on this issue.

Andrew Xue

unread,

Apr 28, 2012, 7:08:49 PM4/28/12

to cascalog-user

actually I think i read the logs wrong -- the fail seems to happen
when one datanode is transmitting the job.jar to another data
node ...

Andrew Xue

unread,

Apr 28, 2012, 7:15:01 PM4/28/12

to cascalog-user

... and the receiving data node is throwing a
BlockAlreadyExistsException

On Apr 28, 6:58 pm, Andrew Xue <and...@lumoslabs.com> wrote:

Reply all

Reply to author

Forward