PigPen STREAM command

24 views
Skip to first unread message

ajaybap...@gmail.com

unread,
Feb 9, 2016, 11:25:06 AM2/9/16
to PigPen Support
Hello,

Is there something similar to pig STREAM in PigPen (https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#STREAM) or is there as work around to use this in PigPen.

Thank you so much, in advance.

- Ajay

Matt Bossenbroek

unread,
Feb 9, 2016, 11:31:41 AM2/9/16
to ajaybap...@gmail.com, PigPen Support
We don't have anything that uses the stream operator because you usually don't need it. Pig generally uses it as a way to use perl as a UDF, which isn't something that I'd recommend. Personally I just use pig/map instead.

What's the use case?

-Matt

--
You received this message because you are subscribed to the Google Groups "PigPen Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigpen-suppor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ajaybap...@gmail.com

unread,
Feb 9, 2016, 11:56:32 AM2/9/16
to PigPen Support, ajaybap...@gmail.com

Hey Matt,

Thank you for the information.

Use case:
We have a legacy code that uses Pig Stream command, to stream data to a classifier that's written in python (nltk + scilearn etc).

- Ajay


Matt Bossenbroek

unread,
Feb 9, 2016, 12:28:44 PM2/9/16
to ajaybap...@gmail.com, PigPen Support
Gotcha. For that I would probably approach it by using Jython to call your python code within a pig/map command.


If that doesn't work (and you're feeling adventurous), I don't think it would be terribly difficult to extend pigpen to generate the appropriate STREAM command.

You would need to create a new stream operator that would look something like this:

(ns pigpen-demo.core

  (:require [pigpen.raw :as pig-raw]

            [pigpen.core :as pig]

            [pigpen.core.op :as pig-op]

            [pigpen.pig.script :as pig-script]))


(defn stream$

  [command relation]

  (->

    (#'pigpen.raw/command :stream relation {})

    (assoc :command command

           :field-type :native)))


(defn stream [command relation]

  (->> relation

    (pig-op/bind$ (pig-op/map->bind 'pr-str) {:field-type :native})

    (stream$ command)

    (pig-op/bind$ (pig-op/map->bind 'read-string) {:field-type-in :native})))


(defmethod pig-script/command->script :stream

  [{:keys [id ancestors command]}]

  (let [pig-id (#'pigpen.pig.script/escape-id id)

        relation-id (#'pigpen.pig.script/escape-id (first ancestors))]

    (str pig-id " = STREAM " relation-id " THROUGH '" command "';")))



The first function defines the raw pig operator, the second handles serialization around the stream, and the third implements the script generation for pig. Of course this wouldn't work locally or for cascading (you'd have to implement their respective multimethods), but let me know if you're interested in that path and I can certainly provide some help & guidance.


-Matt

Reply all
Reply to author
Forward
0 new messages