Hi Aleks,
Welcome. At Climate Corp we usually use JAR based jobs (defined in Cascalog), but here's an example of a jobdef for a Steaming job:
https://gist.github.com/mlimotte/5541018
The defstep defines a single step in the process. A pig step would be similar, I haven't tried pig, so I don't know the exact syntax, but if you give me a elastic-mapreduce command line, I can help translate it.
You can include as many defsteps as you want in the jobdef. The ones that are actually run are controlled by the fire! call, as shown in the example.
Alternatively, the steps can be in a fn:
(defn get-steps [eopts] (vector first-step second-step))
(fire! sample-cluster get-steps)
defsteps create Maps, so you can also define one defstep, and then modify it (using standard clojure, e.g. assoc) in a fn like get-steps to produce variations.
For pipelining, Elastic Mapreduce runs each step serially. So if the first step writes to some path, specify that same path as an input in the next step. My example uses s3 paths, but you could use HDFS paths as well for the intermediate data. From Amazon's docs: "If you use HDFS, prepend the path with hdfs:///. Make sure to use three slashes (///), as in hdfs:///home/hadoop/sampleInput2/."
Marc