Run different types of jobs

33 views
Skip to first unread message

Aleks.E

unread,
May 7, 2013, 5:09:14 PM5/7/13
to lemur...@googlegroups.com
Hello,

I am new to lemur, I have some code which I am trying to change to run using lemur.

The job is currently controlled by bash code, and it is multi step processes which is using streaming and pig.

I was looking to documentation and not sure if it is possible to create job pipe which will use multiple artifices and parameters. The whole thing is pipeline so output of one of the job will be input for other.

Is there way to do this in lemur, if so is there any examples?

Regards,
Aleks

Marc Limotte

unread,
May 8, 2013, 10:56:58 AM5/8/13
to lemur...@googlegroups.com
Hi Aleks,

Welcome.  At Climate Corp we usually use JAR based jobs (defined in Cascalog), but here's an example of a jobdef for a Steaming job:

https://gist.github.com/mlimotte/5541018

The defstep defines a single step in the process.  A pig step would be similar, I haven't tried pig, so I don't know the exact syntax, but if you give me a elastic-mapreduce command line, I can help translate it.

You can include as many defsteps as you want in the jobdef.  The ones that are actually run are controlled by the fire! call, as shown in the example.

Alternatively, the steps can be in a fn:

(defn get-steps [eopts] (vector first-step second-step))
(fire! sample-cluster get-steps)

defsteps create Maps, so you can also define one defstep, and then modify it (using standard clojure, e.g. assoc) in a fn like get-steps to produce variations.

For pipelining, Elastic Mapreduce runs each step serially.  So if the first step writes to some path, specify that same path as an input in the next step.  My example uses s3 paths, but you could use HDFS paths as well for the intermediate data.  From Amazon's docs: "If you use HDFS, prepend the path with hdfs:///. Make sure to use three slashes (///), as in hdfs:///home/hadoop/sampleInput2/."

Marc





--
You received this message because you are subscribed to the Google Groups "Lemur User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lemur-user+...@googlegroups.com.
To post to this group, send email to lemur...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/lemur-user/-/JJ43pq5rMF8J.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages