State of categorical variables

157 views
Skip to first unread message

Benjamin Deonovic

unread,
Nov 12, 2015, 5:11:02 PM11/12/15
to julia-stats
How is the state of categorical variables (or "factors" as they are called in R) going? I know The current implementation is PooledDataArrays in DataFrames. I was wondering what holes still exist, etc. Will categorical variables/factors be possible outside of a data frame? How about the famous R table function? I believe that is still missing.

Milan Bouchet-Valat

unread,
Nov 12, 2015, 5:43:03 PM11/12/15
to julia...@googlegroups.com
PooledDataArrays are perfectly usable outside of DataFrames.

John experimented with a replacement, but I'm not sure what's the
status of his work at the moment:
https://github.com/johnmyleswhite/CategoricalData.jl


Regarding an equivalent of table(), see
http://statsbasejl.readthedocs.org/en/latest/counts.html

as well as this small package of mine:
https://github.com/nalimilan/FreqTables.jl

I doesn't work currently with 0.4, but that should be easy to fix (I'll
do it soon).


Regards

> --
> You received this message because you are subscribed to the Google
> Groups "julia-stats" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to julia-stats...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Benjamin Deonovic

unread,
Nov 12, 2015, 6:34:21 PM11/12/15
to julia-stats
Counts doesn't give the same output as table because you have to provide counts with the levels of the variables and you can only provide "ranges" as levels. Thanks for the other suggestions.

Douglas Bates

unread,
Nov 15, 2015, 1:09:19 PM11/15/15
to julia-stats
See StatsBase::countmap for generating a table.

Benjamin Deonovic

unread,
Nov 16, 2015, 6:06:49 PM11/16/15
to julia-stats
Countmap doesn't do cross-tabulation of two variables e.g. R's table(x,y).
Reply all
Reply to author
Forward
0 new messages