How to calculate the object size of csv file in memory.

Yanish Pradhananga

unread,

Oct 8, 2014, 2:04:00 PM10/8/14

to rha...@googlegroups.com

I read the reg.csv file which is of size 93Mb.
hdfs.init();
f = hdfs.file("/user/sampledata/reg.csv","r",buffersize=104857600);
m = hdfs.read(f);
c = rawToChar(m);
data = read.table(textConnection(c), sep = ",",header = TRUE);
head(regh)
object.size(data)
116808 bytes
object.size(c)
65632 bytes
object.size(m)
65576 bytes
object.size(f)
264 bytes
object.size(m)
object.size(f)
I got the result like this how this object size is allocated.
Why i got this object sizes?

Antonio Piccolboni

unread,

Oct 8, 2014, 2:12:39 PM10/8/14

to RHadoop Google Group

I am not sure which object sizes are wrong here, but I know there is a separate call in rhdfs to read text files, so I would start looking into that one

rhdfs::hdfs.read.text.file

A

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yanish Pradhananga

unread,

Oct 8, 2014, 4:43:18 PM10/8/14

to rha...@googlegroups.com, ant...@piccolboni.info

hdfs.init();
f = hdfs.file("/user/sampledata/reg.csv","r",buffersize=104857600);
m = hdfs.read(f);
c = rawToChar(m);

dataa = read.table(textConnection(c), sep = ",",header = TRUE,fill = TRUE);
head(dataa) ## work and show the head of dataa this reg.csv file is of 93Mb

g=hdfs.file("/user/sampledata/halffreg.csv","r",buffersize=104857600); # This halffreg.csv is of 26 Mb which is the one third of same file.
h=hdfs.read(g);
d=rawToChar(h);
sata=read.table(textConnection(c),sep=",",header=TRUE,fill=TRUE);

object.size(c)
65632 bytes
> object.size(m)
65576 bytes
> object.size(f)
264 bytes

> object.size(dataa)
116808 bytes
> object.size(g)
264 bytes
> object.size(h)
65576 bytes
> object.size(d)
65632 bytes
> object.size(sata)
116808 bytes
> ncol(sata)
[1] 11
> ncol(dataa)
[1] 11
> nrow(sata)
[1] 1965
> nrow(dataa)
[1] 1965

Why same object size for different data and why it is showing 1965 in both dataset. Actual nrow of reg.csv is

2801660 and actual nrow of halffreg.csv is 1000830. ncol of both csv file is 11. And head(dataa) and head(sata) both work
fine and display head.

Antonio Piccolboni

unread,

Oct 8, 2014, 10:02:45 PM10/8/14

to RHadoop Google Group

That I don't know. It's probably a question for an the R-help mailing list.

Antonio

Actual nrow of reg.csv is

2801660 and actual nrow of halffreg.csv is 1000830. ncol of both csv file is 11. And head(dataa) and head(sata) both work
fine and display head.

--

Reply all

Reply to author

Forward