RCall conventions regarding R's garbage collection

127 views
Skip to first unread message

Douglas Bates

unread,
Jan 7, 2015, 11:59:20 AM1/7/15
to julia...@googlegroups.com
There are conflicting objectives in deciding how to treat R objects obtained through the RCall interface relative to the RCall interface. It is desirable to have Julia and R access the same memory for the contents if, for example, you want to modify a value in Julia and have the modification available in R.  In that case you want both systems to agree on who gets to do garbage collection and whether the contents are protected.  On the other hand, R is conservative about protecting objects in function calls (an R function should not modify the contents of an argument) which can lead to extravagant memory usage is you want to update a large R object for, say, new parameter values.

I think the best solution is to establish a naming convention about who owns the memory and who protects it.  RCall has a 'rawvector' method that returns the unprotected contents of an R object as a Julia vector.  I think that, in keeping with the "consenting adults" approach of Julia, there should be a way to return the raw, potentially unprotected, vector.  Currently there is a DataArray method for various types of R objects that makes them visible as DataArray types.  These should probably be protected if they are to share the same storage between R and Julia.  To avoid excessive memory usage, there should be a Julia finalizer method that calls the R interface function R_ReleaseObject on the object (which means that the original R SEXP has to be retained somewhere).  I think this will require new types that extend DataArrays.AbstractDataArray.  I can't see an easy way of retaining the R SEXP for a DataArrays.DataArray type.

Another situation that Randy Lai and I were discussing in issue #7 is the case where an R object is immediately copied into a Julia object.  It is probably not a big deal to protect and unprotect the object for the duration of the copy operation.

Right now it seems to me that there should be types with names like RDataArray, RPooledDataArray and perhaps RDataFrame that use the storage from R but protect it from garbage collection and retain the R SEXP for a finalizer.  Calling DataArray or DataFrame on such an object results in its being copied to memory locations allocated by Julia.  Does that make sense?

Douglas Bates

unread,
Jan 7, 2015, 4:42:01 PM1/7/15
to julia...@googlegroups.com
On looking at the documentation for finalizer it seems that one defines a finalizer for an object, not for a type.  That may mean that it is possible to use the existing DataArrays.DataArray or DataArrays.PooledDataArray types if the constructor that takes an SEXP argument can somehow keep track of the SEXP that generated the array.  Sounds like a closure to me. Another possibility for atomic vectors is to use the fact that the pointer to the array contents is a fixed offset (called voffset) from the Ptr{Void} that is the SEXP.

Randy Lai

unread,
Jan 7, 2015, 6:33:29 PM1/7/15
to julia...@googlegroups.com
We perhaps need something like

type SEXP{N}                       # N is the R type value (e.g. 1=>SYMSXP)
    p::Ptr{Void}
    function SEXP(p::Ptr{Void})
        s = new(p)
        PreserveObject(s)
        finalizer(s, ReleaseObject)
        s
    end
end
Message has been deleted

Randy Lai

unread,
Jan 7, 2015, 6:36:34 PM1/7/15
to julia...@googlegroups.com
But here I am preserving ALL possible pointers which expose to julia, and obviously, it is too much.

Douglas Bates

unread,
Jan 8, 2015, 11:19:34 AM1/8/15
to julia...@googlegroups.com
I now think that it would be best to write DataArray constructors from an SEXP and have them do the call to R_PreserveObject with a finalizer that calls R_ReleaseObject.  I was going to use the voffset to calculate the original location that the SEXP pointed to.  That would work in almost all cases but would fail spectacularly if the user replaced the data member in the DataArray object.  I'll need to check out whether a closure will work.

Randy Lai

unread,
Jan 8, 2015, 3:34:56 PM1/8/15
to julia...@googlegroups.com
It may be of interest to rename `rawvector` function to `unsafe_rawvector`. I think it is the convention of Julia to name a function which is not safe with unsafe prefix. 

You mentioned that `protect` and `unprotect` could be used during the existence of the `rawvector`, but I believe that for some situations, we would like to keep the vector existing throughout the process. Besides `DataArray` objects, it is equally important to have `Array` constructors which has the Preserve and Release mechanics.

Douglas Bates

unread,
Jan 8, 2015, 4:25:26 PM1/8/15
to julia...@googlegroups.com
I ended up taking a slightly different approach.  I wrote methods for Base.vec that call R_PreserveObject on the SEXP and install a finalizer to call R_ReleaseObject.  The one exception is STRSXP (a vector of character strings) where the Base.vec method creates the vector by calling Base.bytestring on the individual CHARSXP and Base.bytestring copies the string.

Base.vec returns a Vector.  The methods for dataset return an object that reflects the R type.  Vectors or arrays are returned as DataArrays, data.frames are returned as DataFrames, 

I also added methods for Base.size that return the size in the Julia sense.  (R's dim function returns NULL for a vector whereas Base.size returns the 1-tuple (length(v),) )
Reply all
Reply to author
Forward
0 new messages