We are trying out pigpen in our project, and at present we struck at a point and we were wondering if anyone can help.
Problem:
We have a training file from which we have to build a model (which is a clojure code) and that model will be used in one of the step in pigpen for classification. What's the best way to do that ?
Example:
(def model (atom {})) ;; bad practice just used for the sake ease in this example
(defn train-model [model-file]
... ;; trains model, records features in model
)
(defn classify [record]
.... ;; classify, uses model
)
(defn process [training-file input-path output-path]
(let [trained-model (train-model training-file)] ;; is this the right thing to do ?
(->>
(pig/load-string input-path)
(pig/map classify)
(pig/store output-path)
)
)
)
(pigpen/write-script "large-scale-classification.pig" (process "$traning-file" "$input-file" "$output-path")
Thank you in advance.
(defn train-model [training-file]
(->>
(pig/load-string training-file)
(pig/reduce ... ))) ;; reduce model into single record if it isn't already
(defn classify [record model]
.... ;; classify, uses model
)
(defn process [training-file input-path output-path]
(let [input-data (pig/load-string input-path)
trained-model (train-model training-file)]
(->>
(pig/join [(input-data :on (constantly 42))
(trained-model :on (constantly 42))]
classify
{:strategy :replicated})
(pig/map classify)
(pig/store output-path))))
More on pig joins here: https://pig.apache.org/docs/r0.15.0/perf.html#replicated-joins
(defn process [model-file input-path output-path]
(->>
(pig/load-string input-path)
(pig/map (let [model (load-trained-model model-file)]
(fn [record]
(classify record model))))
(pig/store output-path)))
--You received this message because you are subscribed to the Google Groups "PigPen Support" group.To unsubscribe from this group and stop receiving emails from it, send an email to pigpen-suppor...@googlegroups.com.For more options, visit https://groups.google.com/d/optout.