Jobs not showing up in the jobtracker

260 views
Skip to first unread message

vavkkishore

unread,
Mar 14, 2011, 12:39:52 AM3/14/11
to cascading-user
Hi All,

We setup a Hadoop cluster and developed simple Cascading Flows
(processing two log files). When the cascading jar is executed as
"hadoop -jar my_cascading.jar" from "master" node command prompt, the
program executed successfully and we can see the expected result.

But we dont see this cascading job entries in Hadoop job tracker
monitoring web pages (http://localhost:50030/jobtracker.jsp)?

Any clues what we are doing wrong here?

Thanks,
Kishore Veleti A.V.K.

Chris K Wensel

unread,
Mar 14, 2011, 2:58:46 AM3/14/11
to cascadi...@googlegroups.com
If you are reading local files and the Flow is only one job, it will not run in the cluster.

ckw

> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support, and licensing for Cascading

vavkkishore

unread,
Mar 14, 2011, 3:13:06 PM3/14/11
to cascading-user
Hi Chris,

Thanks for your reply, I am using LFS and will try HFS.

But any specific reason why if it is 1 job not showing up in the
jobracker?

This is my requirement:

1. I have 7 GB of log files
2. Need to parse the log files and achieve around 5 requirements which
I mentioned in one of my other threads in this group
3. As per your response below if it is not running in the cluster for
a single job then how can we use the power Hadoop's distributed file
system and processing?

Am I missing something here, please help me.

Thanks,
Kishore

On Mar 14, 11:58 am, Chris K Wensel <ch...@wensel.net> wrote:
> If you are reading local files and the Flow is only one job, it will not run in the cluster.
>
> ckw
>
> On Mar 13, 2011, at 9:39 PM, vavkkishore wrote:
>
>
>
> > Hi All,
>
> > We setup a Hadoop cluster and developed simple Cascading Flows
> > (processing two log files). When the cascading jar is executed as
> > "hadoop -jar my_cascading.jar" from "master" node command prompt, the
> > program executed successfully and we can see the expected result.
>
> > But we dont see this cascading job entries in Hadoop job tracker
> > monitoring web pages (http://localhost:50030/jobtracker.jsp)?
>
> > Any clues what we are doing wrong here?
>
> > Thanks,
> > Kishore Veleti A.V.K.
>
> > --
> > You received this message because you are subscribed to the Google Groups "cascading-user" group.
> > To post to this group, send email to cascadi...@googlegroups.com.
> > To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/cascading-user?hl=en.
>
> --
> Chris K Wensel
> ch...@concurrentinc.comhttp://www.concurrentinc.com

Chris K Wensel

unread,
Mar 14, 2011, 3:49:11 PM3/14/11
to cascadi...@googlegroups.com

It won't show in the job tracker because job 1 is in local mode. Local mode doesn't use the cluster as it is run locally in the local jvm.

Using Lfs as a sink or a source forces Hadoop to run in local mode for the corresponding job because the file being touched is local to the jvm, not in the cluster.

hope this helps.

ckw

> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

vavkkishore

unread,
Mar 15, 2011, 11:57:07 PM3/15/11
to cascading-user
Hi Chris,

Below is my code and during execution I am running below commands from
command prompt. Eventually JOB is failing with this exception:

java.io.IOException: Split class cascading.tap.hadoop.MultiInputSplit
not found
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:326)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
cascading.tap.hadoop.MultiInputSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)


Steps Executed to run the program::
###############################


===========================================================
Step 1: Copy the log files from local file system to Hadoop file
system
===========================================================
./hadoop dfs -copyFromLocal /home/hadoopcluster/
sample_programs_and_data/sample_logs_data

(above command copied 2 log files to Hadoop file system)

===========================================================
Step 2: Run the hadoop cascading JAR file
===========================================================

./hadoop jar /home/hadoopcluster/cluster_problem_resolution/cascading-
log-analyzer.jar cluster_resolution cluster_resolution_output

OUTPUT: I am getting exception as detailed above.

===========================================================
Below is the Java code for the same:
===========================================================

public class MainClass{

private static Double totalNumberOfLines = 0.0;
private static Map<String, Double> operatingSystem = new
LinkedHashMap<String, Double>();
private static Properties properties = new Properties();

public static void main(String[] args) throws IOException {

FlowConnector.setApplicationJarClass(properties,
MainClass.class);

Tap input = new Hfs(new TextLine(), args[0]);
Pipe pipe = new Each("Requirements", new Fields("line"), new
CustomStringParserFunction());

//Requirement : Operating System ratio
Tap totalNumberOfLinesSink = new Hfs(new TextLine(), args[1]+"/
linecounter");
getTotalNumberOfLines(input, totalNumberOfLinesSink, pipe);

//Requirement : Method count
Tap methodSink = new Hfs(new TextLine(), args[1]+"/methodcount");
methodCount(input, methodSink, pipe);


//Requirement : Unique Request
Tap uniqueRequestSink = new Hfs(new TextLine(), args[1]+"/
uniquerequest");
uniqueRequest(input, uniqueRequestSink, pipe);

//Requirement : Browser Agent
Tap browserAgentSink = new Hfs(new TextLine(), args[1]+"/
browseragent");
browserAgent(input, browserAgentSink, pipe);

//Requirement : Operating System ratio
Tap operatingSystemSink = new Hfs(new TextLine(), args[1]+"/
operatingsystem");
operatingSystem(input, operatingSystemSink, pipe);

//Requirement : Failed Pages
Tap failedPagesSink = new Hfs(new TextLine(), args[1]+"/
failedpages");
failedPages(input, failedPagesSink, pipe);

JobConf job = new JobConf(MainClass.class);
FileSystem hdfs = FileSystem.get(job);
Path src = new Path(args[1]);
Path dst = new Path(args[2]);
hdfs.copyToLocalFile(src, dst);

}

private static void executeFlows(Tap input, Tap sink, Pipe pipe){

FlowConnector flowconnector = new FlowConnector(properties);
Flow flow = flowconnector.connect(input, sink, pipe);

flow.start();
flow.complete();
}

private static void methodCount(Tap input, Tap sink, final Pipe pipe)
{

//Pipe pipe = new Each("methodRequirement", new Fields("line"), new
CustomStringParserFunction());
Pipe pipe1 = new GroupBy(pipe, new Fields("method"));

Aggregator aggregator = new Count(new Fields("count1"));
pipe1 = new Every(pipe1, aggregator);

executeFlows(input,sink,pipe1);
}


private static void uniqueRequest(Tap input, Tap sink, final Pipe
pipe){
//Pipe pipe = new Each("uniqueRequirement", new Fields("line"), new
CustomStringParserFunction());
Pipe pipe1 = new GroupBy(pipe, new Fields("resource"));

Aggregator aggregator = new Count(new Fields("count2"));
pipe1 = new Every(pipe1, aggregator);

executeFlows(input, sink, pipe1);
}

private static void browserAgent(Tap input, Tap sink, final Pipe pipe)
{
//Pipe pipe = new Each("browserAgentRequirement", new
Fields("line"), new CustomStringParserFunction());
Pipe pipe1 = new GroupBy(pipe, new Fields("browserAgent"));

Aggregator aggregator = new Count(new Fields("count3"));
pipe1 = new Every(pipe1, aggregator);

executeFlows(input, sink, pipe1);
}

private static void operatingSystem(Tap input, Tap sink, final Pipe
pipe){
//Pipe pipe = new Each("browserAgentRequirement", new
Fields("line"), new CustomStringParserFunction());
Pipe pipe1= new GroupBy(pipe, new Fields("operatingSyatem"));

Aggregator aggregator = new Count(new Fields("count4"));
pipe1 = new Every(pipe1, aggregator);

executeFlows(input, sink, pipe1);
}

private static void failedPages(Tap input, Tap sink, final Pipe pipe)
{

Pipe pipe1 = new GroupBy(pipe, new Fields("isErrorPage"));
pipe1 = new Every(pipe1, Fields.GROUP, new Sum(new
Fields("isErrorPage == 1")));

executeFlows(input, sink, pipe1);
}

private static void getTotalNumberOfLines(Tap input, Tap sink, final
Pipe pipe){
Pipe pipe1 = new GroupBy(pipe, new Fields("dummyCounter"));/*
Aggregator aggregator = new Sum(new Fields("dummyCounter == 1"));*/
pipe1 = new Every(pipe1, Fields.GROUP, new Sum(new
Fields("dummyCounter == 1")));

executeFlows(input, sink, pipe1);
}
}


public class CustomStringParserFunction extends BaseOperation
implements Function {

public CustomStringParserFunction() {
super(1, new Fields("browserAgent", "operatingSyatem", "method",
"resource", "isErrorPage", "dummyCounter"));
}

@Override
public void operate(FlowProcess flowProcess, FunctionCall
functionCall) {
TupleEntry arguments = functionCall.getArguments();
String[] array = (String.valueOf(arguments.getString(0))).split("
");

Tuple tuple = new Tuple();
tuple.add(array[21]);

tuple.add(array[15]);
tuple.add((array[5].toString()).replace("\"", ""));
tuple.add(array[6]);
if(!array[8].equals("200")){
tuple.add(1);
}else{
tuple.add(0);
}
tuple.add(1);

functionCall.getOutputCollector().add(tuple);

Chris K Wensel

unread,
Mar 16, 2011, 3:32:43 PM3/16/11
to cascadi...@googlegroups.com
This is very likely due to not putting the cascading jar or some dependency in the 'lib' folder of your jar file.

ckw

> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

vavkkishore

unread,
Mar 17, 2011, 11:28:06 AM3/17/11
to cascading-user
Hi Chris,

I found out that the JAR file has lib directory and it has all the JAR
files related to cascading, still not working. What could be the
reason?

Please help not able to proceed further.

Thanks,
Kishore Veleti A.V.K.
> ...
>
> read more »

Chris K Wensel

unread,
Mar 17, 2011, 4:40:44 PM3/17/11
to cascadi...@googlegroups.com

if

> jar -tf log-analyzer.jar

doesn't list cascading in the lib folder, I've no idea what else to look for. This isn't a Cascading issue but a Hadoop configuration issue. it could be likely the hadoop config in your environment isn't the same as your cluster, for example.

ckw

> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Reply all
Reply to author
Forward
0 new messages