trying to setup behemoth project to debug mode in eclipse.

36 views
Skip to first unread message

rajini maski

unread,
Dec 12, 2011, 2:51:48 AM12/12/11
to DigitalPebble
Hi,

I am trying to implement behemoth model in my localhost.I have set
up pseudo-distributed mode of hadoop and configured hdfs as well. Now,
I have been trying to debug this model through eclipse, by configuring
with existing ant build file project (only tika module). And I can see
the Tika model set up as an individual project.[I have added all the
jar files of behemoth-gate.job and tika module library]

I am stuck in place of loading input and output arguments. When I
am trying to debug this model as java application, It asks me for 2
input arguments.. And arguments given are as follows:
-i /data/SampleDocument.pdf
-o /data

And these files are there in hdfs file system and configured in core-
site.xml and core-default.xml of hadoop.

When I run this, I get an error as: WARN mapred.JobClient: No job
jar file set. User classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: file:/data/SampleDocument.pdf
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:
190)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:
44)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:
201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:
810)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:
781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)


Is there any other configuration set up in behemoth-site.xml or
behemoth-default.xml?
Is the input specified correct?
How to setup behemoth complete project into debug mode?

DigitalPebble

unread,
Dec 12, 2011, 3:49:23 AM12/12/11
to digita...@googlegroups.com
Hi



  I am trying to implement behemoth model in my localhost.I have set
up pseudo-distributed mode of hadoop and configured hdfs as well. Now,
I have been trying to debug this model through eclipse, by configuring
with existing ant build file project (only tika module). And I can see
the Tika model set up as an individual project.[I have added all the
jar files of behemoth-gate.job and tika module library]

Follow the steps on https://github.com/jnioche/behemoth/wiki/tutorial

You should get a behemoth-tika.job file which can then be used in Hadoop
 


Have a look at the tutorial. The input is not a file but a SequenceFile of BehemothDocuments
 
How to setup behemoth complete project into debug mode?

See my reply this morning to another user about debugging Behemoth

Julien
 

--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To post to this group, send an email to digita...@googlegroups.com.
To unsubscribe from this group, send email to digitalpebbl...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/digitalpebble?hl=en-GB.




--
 
Open Source Solutions for Text Engineering
 
http://digitalpebble.blogspot.com
http://www.digitalpebble.com

rajini maski

unread,
Dec 12, 2011, 4:00:23 AM12/12/11
to digita...@googlegroups.com
Thank you so much for the reply.


Is there any sample file  for "SequenceFile of BehemothDocuments". How do i get these files generated? Is it by running GateCorpusgenerator class for gate build? In that case what is input I need to give for Corpus generator class java application? (any text file here? )


Awaiting reply,

Regards,
Rajani


Hi



How to setup behemoth complete project into debug mode?

See my reply this morning to another user about debugging Behemoth

Julien
 

--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To post to this group, send an email to digita...@googlegroups.com.
To unsubscribe from this group, send email to digitalpebbl...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/digitalpebble?hl=en-GB.

DigitalPebble

unread,
Dec 12, 2011, 4:08:29 AM12/12/11
to digita...@googlegroups.com

Why don't you have a look at the tutorial first?
J.

Reply all
Reply to author
Forward
0 new messages