Combining two map files as inputs?

8 views
Skip to first unread message

Jeremiah Rounds

unread,
Oct 14, 2015, 12:37:22 PM10/14/15
to rhipe
Hi,

I forgot how to combine two map file directories using Rhipe and make them both inputs to a MapReduce.  Do they need conversion to sequence files first ?  Never tried.


Thanks,
Jeremiah

Saptarshi Guha

unread,
Oct 14, 2015, 1:13:19 PM10/14/15
to rh...@googlegroups.com
No need, i think you can do input=rhfmt(type='map', dir=c(path1,path2))

--

---
You received this message because you are subscribed to the Google Groups "rhipe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhipe+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jeremiah rounds

unread,
Oct 14, 2015, 1:39:08 PM10/14/15
to rh...@googlegroups.com
Thanks Saptarshi.  Still re-learning everything =)

--

---
You received this message because you are subscribed to a topic in the Google Groups "rhipe" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rhipe/bVLFBJu3Vvc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rhipe+un...@googlegroups.com.

Jeremiah Rounds

unread,
Oct 14, 2015, 6:51:19 PM10/14/15
to rhipe
If Ryan reads this I would be curious if there was a datadr ideom for merging ddo objects, but at the moment I have no pressing need. I did what follows both dxsummary_2 and ili_dmisid_mmwr_moving_average were ddo storage directories.




path1 =  "/user/roun308/chiron/dxsummary_2"
path2 =  "/user/roun308/chiron/ili_dmisid_mmwr_moving_average"
output = "/user/roun308/chiron/all_mmwr"
minput=rhfmt(type='map',folder=c(path1,path2))
moutput = rhfmt(type='map', folder=output)
map = expression({
  for(i in seq_along(map.values)){
    #kind of hacky but the class of the map value tells us which input it is from
    v = map.values[[i]]
    k = map.keys[[i]]
    type_of = switch(class(v), 
                     data.frame = {"total_visits"},
                     list = {"ili_visits"},
                     {stop("Unknown value class", class(v))}
                )
    new.value = list()
    new.value[[type_of]] = v
    rhcollect(k, new.value)
  }
  
})
reduce = expression(
  pre= {
    data = list()
  },
  reduce = {
    data = append(data, reduce.values)
  },
  post = {
    rhcollect(reduce.key, data)
  }
)
mapred =list(
  rhipe_map_buff_size=100, 
  mapreduce.map.memory.mb=4000,   
  mapreduce.map.java.opts= "-Xmx2000M",
  #
  rhipe_reduce_buff_size=100,
  mapreduce.reduce.memory.mb=4000,
  mapreduce.reduce.java.opts= "-Xmx2000M",
  #
  mapreduce.job.maps= 100,
  mapreduce.job.reduces=100,
  mapreduce.job.name = "update"
)

ret = rhwatch(map, reduce, input=minput, output=moutput, mapred=mapred)
hdfs_mmwr = hdfsConn("/user/roun308/chiron/all_mmwr")
ddo_mmwr = ddo(hdfs_mmwr)



On Wednesday, October 14, 2015 at 10:39:08 AM UTC-7, Jeremiah Rounds wrote:
Thanks Saptarshi.  Still re-learning everything =)
On Wed, Oct 14, 2015 at 10:12 AM, Saptarshi Guha wrote:
No need, i think you can do input=rhfmt(type='map', dir=c(path1,path2))
On Wed, Oct 14, 2015 at 9:37 AM, Jeremiah Rounds wrote:
Hi,

I forgot how to combine two map file directories using Rhipe and make them both inputs to a MapReduce.  Do they need conversion to sequence files first ?  Never tried.


Thanks,
Jeremiah

--

---
You received this message because you are subscribed to the Google Groups "rhipe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhipe+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to a topic in the Google Groups "rhipe" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rhipe/bVLFBJu3Vvc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rhipe+unsubscribe@googlegroups.com.

Ashrith

unread,
Oct 15, 2015, 3:05:22 PM10/15/15
to rhipe
I usually do rhls(parentfolder)$file as the input. And that brings in all the file. But this is for taking in all files, folders. 
Reply all
Reply to author
Forward
0 new messages