Hi
as.integer(as.character(data$age))
is the recommended approach, rather than
as.integer(levels(data$age)[data$age])
i.e. turn the factors into characters which are the values from the levels
and then convert them to numbers.
Ideally, read the data from file by providing a value for the colClasses argument and specifying
the anticipated types for each column, if this is possible. But either approach works.
What is happening is that
typeof( data$age )
is integer. So as.integer(data$age) just gives the integer values.
But these values are, as Mike's code suggests, the indices/positions
of the set of unique values that make up the factor, i.e., the levels.
These integer values in a factor are always 1, 2, 3, up to the total number of unique elements.
So if you have ages 17, 18, ... then these will be mapped to 1, 2, 3, ...
This isn't crazy. We exploit this integer representation in R and so it is very convenient,
but it makes transforming actual values to numbers one step more involved.
D.
--
Director, Data Sciences Initiative, UC Davis
Professor, Dept. of Statistics, UC Davis
http://datascience.ucdavis.edu
http://www.stat.ucdavis.edu/~duncan