Using memory object as input instead of a disk file

12 views
Skip to first unread message

nae...@ic.ufal.br

unread,
May 19, 2015, 5:45:04 AM5/19/15
to rh...@googlegroups.com
How do I use a object in memory as input on rhwatch?

I have a object
in_mem <- rhread("small.csv",type="text",max=-1,mc=FALSE,buffsize=2*1024*1024)

and inside rhwatch I want to use it as input, like this :

mr <- rhwatch(
  map      = map,
  reduce   = reduce,
  input    = rhfmt(in_mem,type="sequence"),
  output   = rhfmt("outputWB", type = "sequence"),
  readback = TRUE 
)

But I'm getting errors this way.

Saptarshi Guha

unread,
May 19, 2015, 12:31:49 PM5/19/15
to rh...@googlegroups.com

On Tue, May 19, 2015 at 2:45 AM, <nae...@ic.ufal.br> wrote:
in_mem <- rhread("small.csv",type="text"


​It's not so slick. You'll need to do step before that. Write in_mem to disk

1) If in_mem is a matrix or data frame or list, it will break it into chunks and write chunks across files (the key will be meaningless)
or
2) you can​ write as a traditional key-value sequence. But you'll need to convert your input_mem into a list of lists. Each sub list has a length of two: the key and pair (in the call to rhwrite, this corresponds to kvpairs=TRUE)


Example 1.
dim(in_mem)=c(1,length(in_mem))
rhwrite(in_mem, file="/user/sguha/tmp/xw",chunk=10,numfiles=3,kvpairs=FALSE)

> rhls("/user/sguha/tmp/xw")
  permission owner group        size          modtime                      file
1 -rw-r--r-- sguha users   661 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_1
2 -rw-r--r-- sguha users   672 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_2
3 -rw-r--r-- sguha users   670 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_3
4 -rw-r--r-- sguha users   342 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_4


> rhread("/user/sguha/tmp/xw/part_1")
Read 2 objects(0.55 KB) in 0.18 seconds
[[1]]
[[1]][[1]]
NULL

[[1]][[2]]
      [,1]                  
 [1,] "1,2008-07,1.4"
 [2,] "2,2008-08,21"
 [3,] "3,2008-09,3"
 [4,] "4,2008-10,4"
 [5,] "5,2008-11,4"
 [6,] "6,2008-12,1"
 [7,] "7,2009-01,1"
 [8,] "8,2009-02,3"
 [9,] "9,2009-03,1"
[10,] "10,2009-04,3"


[[2]]
[[2]][[1]]
NULL

[[2]][[2]]
      [,1]                  
 [1,] "11,2009-05,1"
 [2,] "12,2009-06,5"
 [3,] "13,2009-07,7.4"


Example 2:
in_mem2 = Map(function(a,b) list(a, b), seq_along(in_mem) , in_mem)
rhwrite(in_mem2, file="/user/sguha/tmp/xw", chunk=10,numfiles=3)
> rhread("/user/sguha/tmp/xw/part_1")
Read 2 objects(0.09 KB) in 0.05 seconds
[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] "1,2008-07,1.4"


[[2]]
[[2]][[1]]
[1] 11

[[2]][[2]]
[1] "11,2009-05,3"



You then do (since rhwrite writes type 'sequence' )
rhwatch(..., input="/user/sguha/tmp/xw", etc ...
There is no need for rhfmt in the input, RHIPE defaults to sequence.

Depending on the data structure of in_mem, Example 1 or 2 might be easier.

HTH
Saptarshi

nae...@ic.ufal.br

unread,
May 23, 2015, 2:13:08 AM5/23/15
to rh...@googlegroups.com, saptars...@gmail.com
Thanks!
It really helped.
It's nice to know I can count on the help of kind people like you.
Reply all
Reply to author
Forward
0 new messages