using table() function for all categorical variables in a data set

23 views
Skip to first unread message

C G Venkatesh

unread,
Jan 7, 2015, 12:41:54 AM1/7/15
to rro...@googlegroups.com
I want to create a function to create table output for all  categorical variables(say v1 ,v2 ,v3 ,v4 ....) in data set ev_table , using table() function 

table = data.frame(table( ev_table$v1, ev_table$target_var))

to avoid repeating the same code for each variable again and again...

Attempts to paste names of variables leads to error due to quotes getting carried along with the name / character value of the variable

any help welcome..thanks in advance

stephen...@revolution-computing.com

unread,
Jan 7, 2015, 3:23:09 PM1/7/15
to rro...@googlegroups.com
Their are a number of ways you could do this.

One way would be to use sapply() to determine which columns in your dataframe are 'factor' types.
Then you can use  the ftable() function to compute the cell counts.

Here is an example. Let me know if this doesn't accomplish what you are looking to do.  The data I am using in this example is
the Airline data that comes with Revolution R Professional. You can download it from here:   http://packages.revolutionanalytics.com/datasets/AirOnTimeCSV2012/

colNames.type <- sapply(testDF, "data.class")
> colNames.type
             Year             Month        DayofMonth         DayOfWeek
         "factor"          "factor"         "numeric"          "factor"
          DepTime        CRSDepTime           ArrTime        CRSArrTime
        "numeric"         "numeric"         "numeric"         "numeric"
    UniqueCarrier         FlightNum           TailNum ActualElapsedTime
         "factor"          "factor"          "factor"         "numeric"
   CRSElapsedTime           AirTime          ArrDelay          DepDelay
        "numeric"         "numeric"         "numeric"         "numeric"
           Origin              Dest          Distance            TaxiIn
         "factor"          "factor"         "numeric"         "numeric"
          TaxiOut         Cancelled  CancellationCode          Diverted
        "numeric"         "logical"          "factor"         "logical"
     CarrierDelay      WeatherDelay          NASDelay     SecurityDelay
        "numeric"         "numeric"         "numeric"         "numeric"
LateAircraftDelay
        "numeric"

colNames.type <- names(colNames.type)[colNames.type =="factor"]
> colNames.type
[1] "Year"             "Month"            "DayOfWeek"        "UniqueCarrier"  
[5] "FlightNum"        "TailNum"          "Origin"           "Dest"           
[9] "CancellationCode"

myformula <- as.formula(paste(colNames.type[4], "~", colNames.type[3], "+", colNames.type[7], sep = "")
ftable(myformula, data = testDF)

> ftable(UniqueCarrier ~ DayOfWeek + Origin, data = testDF)
                 UniqueCarrier  HP  WN  CO  US  DL  UA  AA  NW  AS  TW  AQ
DayOfWeek Origin                                                         
1         ATL                   12   0   0  25 592   5   0   0   0  35   0
          AUS                   25  20   0   0  14   5  15   0   0  20   0
          BHM                    0  20   0   0   0   0   0   0   0   0   0
          BNA                    0  60   0  20   5   0   0   0   0  30   0
          BOS                   20   0   0  60  30  34  50   0   0  40   0
...
....




Stephen Weller
Revolution Analytics Quality Assurance & Technical Support

Revolution R Plus

Subscribe to Technical Support & Indemnification for R

stephen...@revolution-computing.com

unread,
Jan 7, 2015, 3:24:46 PM1/7/15
to rro...@googlegroups.com
The last line should have been:

ftable(myformula, data = testDF)


Stephen Weller
Reply all
Reply to author
Forward
0 new messages