Looping through a data.frame

17 views
Skip to first unread message

Per Nyfelt

unread,
Jan 26, 2019, 1:50:38 PM1/26/19
to Renjin
If i have a data.frame as follows:

df <- data.frame("SN" = 1:2, "Age" = c(48,17), "Name" = c("Per","Ian"))


the names will be treated as factors:

str(df)
'data.frame': 2 obs. of  3 variables:
 $ SN  : int  1 2
 $ Age : num  48 17
 $ Name: Factor w/ 2 levels "Ian","Per": 2 1

When I loop through that ListVector in java and extract the values using getElementAsObject i get the factor index (i.e. 2 and 1) rather than the values (Per and Ian). How do i fetch the variable value?

Best regards,
Per

Bertram, Alexander

unread,
Jan 27, 2019, 3:50:54 AM1/27/19
to renji...@googlegroups.com
Hi Per,
in your first statement, the Name column was automatically converted to an R language "factor" object by data.frame(). 

In the R language, a factor object is actually an integer vector with class "factor" and an attribute "levels" which is a character vector containing the matching the labels of the factor's values.

From Java, you can access the class and levels attributes using the getAttribute() method of the SEXP instance.

Note that you can control whether data.frame converts strings into factors with the named stringsAsFactors = FALSE

HTH,
Alex

--
You received this message because you are subscribed to the Google Groups "Renjin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to renjin-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Alexander Bertram
Technical Director
BeDataDriven BV

Web: http://bedatadriven.com
Email: al...@bedatadriven.com
Tel. Nederlands: +31(0)647205388
Skype: akbertram

Per Nyfelt

unread,
Jan 27, 2019, 5:27:04 AM1/27/19
to Renjin
Thanks Alex!

If i do this it works: but the line 
row.add(vec.getElementAsObject(col.getElementAsInt(i)-1));

in the transpose method looks unreliable to me (i.e. depending on how the factors are indexed which feels like an implementation detail). Is there a better way?

import org.junit.jupiter.api.Test;
import org.renjin.eval.Context;
import org.renjin.primitives.Types;
import org.renjin.script.RenjinScriptEngine;
import org.renjin.script.RenjinScriptEngineFactory;
import org.renjin.sexp.*;

import javax.script.ScriptException;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;

public class DataFrameTest {

 
@Test
 
public void testDataFrame() throws ScriptException {
   
RenjinScriptEngineFactory factory = new RenjinScriptEngineFactory();
   
RenjinScriptEngine engine = factory.getScriptEngine();
    engine
.eval("df <- data.frame('SN' = 1:3, 'Age' = c(48, 17, 49), 'Name' = c('Per', 'Ian', 'Louise'))");

   
Environment global = engine.getSession().getGlobalEnvironment();
   
Context topContext = engine.getSession().getTopLevelContext();

   
ListVector df = (ListVector)global.getVariable(topContext, "df");

   
List<String> colList = new ArrayList<>();
   
if (df.hasAttributes()) {
       
AttributeMap attributes = df.getAttributes();
       
Map<Symbol, SEXP> attrMap = attributes.toMap();
       
Symbol s = attrMap.keySet().stream().filter(p -> "names".equals(p.getPrintName())).findAny().orElse(null);
       
Vector colNames = (Vector)attrMap.get(s);
       
for(int i = 0; i < colNames.length(); i++) {
          colList
.add(colNames.getElementAsString(i));
       
}
   
}

   
List<Vector> table = new ArrayList<>();
   
for(SEXP col : df) {
       
Vector column = (Vector)col;
       table
.add(column);
   
}
   
List<List<Object>> rowList = transpose(table);

    assertThat
("First row, SN does not match", rowList.get(0).get(0), is(1));
    assertThat
("First row, Name does not match", rowList.get(0).get(1), is(48.0));
    assertThat
("First row, Name does not match", rowList.get(0).get(2), is("Per"));

    assertThat
("Last row, SN does not match", rowList.get(2).get(0), is(3));
    assertThat
("Last row, Name does not match", rowList.get(2).get(1), is(49.0));
    assertThat
("Last row, Name does not match", rowList.get(2).get(2), is("Louise")); }


private List<List<Object>> transpose(List<Vector> table) {
   
List<List<Object>> ret = new ArrayList<>();
   
final int N = table.get(0).length();
   
for (int i = 0; i < N; i++) {
     
List<Object> row = new ArrayList<>();
     
for (Vector col : table) {
         
if (Types.isFactor(col)) {
           
AttributeMap attributes = col.getAttributes();
           
Map<Symbol, SEXP> attrMap = attributes.toMap();
           
Symbol s = attrMap.keySet().stream().filter(p -> "levels".equals(p.getPrintName())).findAny().orElse(null);
           
Vector vec = (Vector)attrMap.get(s);
            row
.add(vec.getElementAsObject(col.getElementAsInt(i)-1)); // works but is this really how to do it?
         
} else {
         row
.add(col.getElementAsObject(i));
         
}
     
}
      ret
.add(row);
   
}
   
return ret;
   
}
}

Best regards,
Per
Reply all
Reply to author
Forward
0 new messages