Using memory object as input instead of a disk file

13 views

Skip to first unread message

nae...@ic.ufal.br

unread,

May 19, 2015, 5:45:04 AM5/19/15

to rh...@googlegroups.com

How do I use a object in memory as input on rhwatch?

I have a object
in_mem <- rhread("small.csv",type="text",max=-1,mc=FALSE,buffsize=2*1024*1024)

and inside rhwatch I want to use it as input, like this :

mr <- rhwatch(
map      = map,
reduce   = reduce,
input    = rhfmt(in_mem,type="sequence"),
output   = rhfmt("outputWB", type = "sequence"),
readback = TRUE
)

But I'm getting errors this way.

Saptarshi Guha

unread,

May 19, 2015, 12:31:49 PM5/19/15

to rh...@googlegroups.com

On Tue, May 19, 2015 at 2:45 AM, <nae...@ic.ufal.br> wrote:

in_mem <- rhread("small.csv",type="text"

It's not so slick. You'll need to do step before that. Write in_mem to disk

1) If in_mem is a matrix or data frame or list, it will break it into chunks and write chunks across files (the key will be meaningless)

2) you can write as a traditional key-value sequence. But you'll need to convert your input_mem into a list of lists. Each sub list has a length of two: the key and pair (in the call to rhwrite, this corresponds to kvpairs=TRUE)

Example 1.
dim(in_mem)=c(1,length(in_mem))
rhwrite(in_mem, file="/user/sguha/tmp/xw",chunk=10,numfiles=3,kvpairs=FALSE)

> rhls("/user/sguha/tmp/xw")
permission owner group        size          modtime                      file
1 -rw-r--r-- sguha users   661 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_1
2 -rw-r--r-- sguha users   672 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_2
3 -rw-r--r-- sguha users   670 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_3
4 -rw-r--r-- sguha users   342 bytes 2015-05-19 16:21 /user/sguha/tmp/xw/part_4

> rhread("/user/sguha/tmp/xw/part_1")
Read 2 objects(0.55 KB) in 0.18 seconds
[[1]]
[[1]][[1]]
NULL

[[1]][[2]]
      [,1]
[1,] "1,2008-07,1.4"
[2,] "2,2008-08,21"
[3,] "3,2008-09,3"
[4,] "4,2008-10,4"
[5,] "5,2008-11,4"
[6,] "6,2008-12,1"
[7,] "7,2009-01,1"
[8,] "8,2009-02,3"
[9,] "9,2009-03,1"
[10,] "10,2009-04,3"

[[2]]
[[2]][[1]]
NULL

[[2]][[2]]
      [,1]
[1,] "11,2009-05,1"
[2,] "12,2009-06,5"
[3,] "13,2009-07,7.4"

Example 2:
in_mem2 = Map(function(a,b) list(a, b), seq_along(in_mem) , in_mem)
rhwrite(in_mem2, file="/user/sguha/tmp/xw", chunk=10,numfiles=3)
> rhread("/user/sguha/tmp/xw/part_1")
Read 2 objects(0.09 KB) in 0.05 seconds
[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] "1,2008-07,1.4"

[[2]]
[[2]][[1]]
[1] 11

[[2]][[2]]
[1] "11,2009-05,3"

You then do (since rhwrite writes type 'sequence' )
rhwatch(..., input="/user/sguha/tmp/xw", etc ...
There is no need for rhfmt in the input, RHIPE defaults to sequence.

Depending on the data structure of in_mem, Example 1 or 2 might be easier.

HTH

Saptarshi

nae...@ic.ufal.br

unread,

May 23, 2015, 2:13:08 AM5/23/15

to rh...@googlegroups.com, saptars...@gmail.com

Thanks!
It really helped.
It's nice to know I can count on the help of kind people like you.

Reply all

Reply to author

Forward

0 new messages