Memory error with ggplot?

316 views
Skip to first unread message

Neotropical bat risk assessments

unread,
Apr 24, 2009, 9:08:40 PM4/24/09
to ggp...@googlegroups.com
Hi all,

I am trying to run some plots on data, but when loading the CSV data
file it doss that but in ggplot R is stopping and I am getting an out
of memory error.
>
> d <- read.csv batcalls.rda
Error: unexpected symbol in "d <- read.csv batcalls.rda"
> attach(d)
Error in attach(d) : object "d" not found
> library(ggplot2)
> print(qplot(Sc, Fc))
Error in eval(expr, envir, enclos) : object "Sc" not found
> data(batcalls)
> attach(batcalls)
> print(qplot(Fc,geom="density"))
> b<-kmeans(Fc,c(10,15,18,30))
> print(qplot(Sc,Fc,colour=as.factor(b$cluster)))
>
> print(qplot(Dur,geom="density"))
> b<-kmeans(Dur,c(10,15,18,30))
> print(qplot(Dur,Fc,colour=as.factor(b$cluster)))

> d <- read.csv ("C:/Rainey/RainyAllbats.csv")
> attach(d)

The following object(s) are masked from batcalls :

Dc Dur Fc Fk Fmax Fmean Fmin Qk Qual S1 Sc st TBC Tc Tk

> library(ggplot2)
> print(qplot(Sc, Fc))
Warning messages:
1: In data.frame(..., check.names = FALSE) :
Reached total allocation of 1535Mb: see help(memory.size)
2: In data.frame(..., check.names = FALSE) :
Reached total allocation of 1535Mb: see help(memory.size)
3: Removed 1200249 rows containing missing values (geom_point).
> data(d)
Warning message:
In data(d) : data set 'd' not found
> attach(d)

The following object(s) are masked from d ( position 3 ) :

Dc Dur Fc Filename Fk Fmax Fmean Fmin Qk Qual S1 Sc st TBC Tc Tk


The following object(s) are masked from batcalls :

Dc Dur Fc Fk Fmax Fmean Fmin Qk Qual S1 Sc st TBC Tc Tk

> print(qplot(Fc,geom="density"))
There were 15 warnings (use warnings() to see them)
> b<-kmeans(Fc,c(10,15,18,30))
Error in switch(nmeth, { : NA/NaN/Inf in foreign function call (arg 1)
> f=jpeg(file="Sc x Fc.jpg")
> print(qplot(Sc,Fc,colour=as.factor(b$cluster)))
Error in data.frame(colour = c(3L, 4L, 4L, 4L, 4L, 1L, 4L, 2L, 1L, 1L, :
arguments imply differing number of rows: 9404, 2400489
> dev.off()

Anyway to tweak this somehow to get it to run?

Using WinXP with 4 GB RAM

Tnx

Bruce

Neotropical bat risk assessments

unread,
Apr 25, 2009, 4:41:25 PM4/25/09
to ggp...@googlegroups.com
Hi all,

Having problems with the use of the contour plot.
Works well with abbreviated data set but not entire data set.
I am not getting a plot at all.

> library(MASS)
> library(batcalls)
> BR<-kde2d(Sc,Fc)
Error: cannot allocate vector of size 228.9 Mb
> f=jpeg(file="Rainey contour plot.jpg")
> filled.contour(BR)
Error in pretty(zlim, nlevels) :
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In min(x, na.rm = na.rm) :
  no non-missing arguments to min; returning Inf
2: In max(x, na.rm = na.rm) :
  no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
> dev.off()
null device
          1

Bruce

hadley wickham

unread,
Apr 25, 2009, 6:41:37 PM4/25/09
to Neotropical bat risk assessments, ggp...@googlegroups.com
Hi Bruce,

You're not giving us enough information to suggest any possible
remedies. At a minimum we need to know what Sc and Fc are (the
output of str(Sc) and str(Fc) would be a good start). You are also
getting an out of memory error from kde2d, which is a part of the MASS
package, not ggplot2.

Hadley
--
http://had.co.nz/

Neotropical bat risk assessments

unread,
Apr 25, 2009, 7:12:15 PM4/25/09
to ggp...@googlegroups.com, hadley wickham
Hi all,

Hadley suggested more info was needed.

The Fc is characteristic frequency (measured in kHz() and Sc is characteristic slope (measured in octaves per second)
and are components of bat echolocation calls.

Data is in CSV format with a header row:

Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc,
9.81,0,28.78,24.54,26.49,25.81,48.84,14.78,
4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83,
4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44,
10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05,
15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64,
14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32,
11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62,
14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06,
8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2,
3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83,
16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11,
16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95,
16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38,
5.12,107.16,32,29.41,30.46,29.41,134.45,20.88,
16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81
.... etc

It is a big data set with 1,200,240 rows!

These below work, with a smaller data set.

[]








[]
[]

[]

The 2D Contour plots may turn out to be very valuable, and will need to figure out better labeling etc.

I was running into some memory issues initially just reading in the  *.CSV  data file.
All plots listed below print with the large data set, except the contour plot.
Hadley suggested that the MASS package my be the problem and not ggplot per se.

# The limit can be raised by calling memory.limit within a running R session
# So I added this at the front end.
memory.size(max = TRUE)
library(batcalls)

BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv")
attach(BR)
library(ggplot2)
f=jpeg(file="Rainey Sc_Fc plot.jpg")
print(qplot(Sc, Fc))
dev.off()       

BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv")
#BR <- read.csv Reduced bats.csv
attach(BR)
library(ggplot2)
print(qplot(Sc, Fc))


f=jpeg(file="RFcden.jpg")
    print(qplot(Fc,geom="density"))
         dev.off()       
     b<-kmeans(Fc,c(10,15,18,30))
f=jpeg(file="Rainey Sc_Fc factor plot.jpg")
    print(qplot(Sc,Fc,colour=as.factor(b$cluster)))
         dev.off()       
  
   print(plot(Dur,geom="density"))

f=jpeg(file="Rainey contour plot.jpg")
     BR<-kmeans(Dur,c(10,15,18,30))
    print(plot(Dur,Fc,colour=as.factor(b$cluster)))
dev.off()                
# so it seems here is the issue with the big data set...
library(MASS)
library(batcalls)
BR<-kde2d(Sc,Fc)

f=jpeg(file="Rainey contour plot.jpg")
filled.contour(BR)      
         dev.off()       

Tnx everyone,
Cheers from the jungles of Belize,

Bruce

Kasper Daniel Hansen

unread,
Apr 25, 2009, 8:19:23 PM4/25/09
to Neotropical bat risk assessments, ggp...@googlegroups.com, hadley wickham
Well, that is a big surface to do a density estimation on.

You are probably just running out of memory. You can diagnose it a bit
better by knowing that
gc()
will report how much memory you have used in a session. (so do it with
your small, working examples and then run gc). You can do this for a
couple of example sizes and see how memory scales with number of rows.
Additionally, you might want to know if your version of R is 64bit (do
a .Machine$sizeof.pointer - if that has the value 8 you are 64 bit, if
it has the value 4 you are are 32 bit).

You probably need to re-think your computation.

Kasper
> <4dfe751.jpg>
>
>
>
>
>
>
>
>
> <4dfe761.jpg>
> <4dfe780.jpg>
>
> <4dfe78f.jpg>
Reply all
Reply to author
Forward
0 new messages