hi --
so i have a lookup function that basically does a mapreduce job to
read small dimension data from S3 and then puts it into a hashmap. i
memoized the function so that the map is stored in memory. code looks
like this
(defn- get-referral-dimension-map* [referral-dimension-path]
(let [rd-src (sdt/get-query referral-dimension-path ["!referral_key"
"!ref_name"]) <- this makes a query and selects fields !referral_key
and !ref_name
tuples (??- rd-src)]
(into {} (first tuples))))
(def get-referral-dimension-map (memoize get-referral-dimension-map*))
(defn get-referral-name [referral-key referral-dimension-path]
(let [m (get-referral-dimension-map referral-dimension-path)] (m
referral-key)))
this gets called in a "main'"query, something like
(<- [?referral_name]
(src ?referral_key)
(get-referral-name ?referral_key :> ?referral_name))
oddly, the behavior I am observing is that a mapreduce job is launched
for the referral-dimension data for every map task in the "main" query
-- just seems like once one map task has called the get-referral-name
function, it be memoized and all subsequent map tasks on a node that
call that function should not need to re-do the mapreduce job.