rownames in data matrices

946 views
Skip to first unread message

Michael Borregaard

unread,
Sep 8, 2015, 3:27:35 AM9/8/15
to julia-users
Hi, I am learning julia, and thought I would practice by migrating some of my code from R. I run into the problem that the data structures in julia does not seem to offer row names? I looked here in the forum and found a 2 year old discussion, that seems to end of the, IMHO slightly imprecise opinion that rownames are a misfeature (https://groups.google.com/forum/#!searchin/julia-users/rownames$20matrix/julia-users/OFbnNLPdWOc/qTnhlm33YzMJ). Of course, in DataFrame it is always possible to add another column with names, and use this by convention, though it does preclude the nice behaviour of automatically extracting a Named Array with appropriate names when extracting a column.

Worse is the case for data matrices, that do not support multiple types. Take for instance an ecological community matrix, that has species as columns, sites as rows, and is filled with integers counting the abundance of individuals. Subsetting a column gives the occupancy of a species, subsetting a row gives the species community in a site. Having row and column names, and being able to index into the array on names, is a really important feature!

How would this be implemented in julia? Is there still a conviction that row names are a misfeature, and why?

Thanks!

Andreas Lobinger

unread,
Sep 8, 2015, 7:23:04 AM9/8/15
to julia-users
Hello colleague,

in the first order i think this could be emulated by a dictionary mapping the row name to an index into a matrix or DataFrame.
Afaics calling this a  'misfeature' comes from trying to make a matrix datatype that has row names by default and many people with numerics/engineering background reserve the name matrix for the simplest possible form: rectangular array with single number entries and integer row and column indexing.

So what you look for: a rectangular collection accessible with both row and column index as names is something new and should have different name. You could browse the dataFrames development and see if there are enough hooks to extend this.

Bringing this into julia as package (written in julia) should not be that complicated if defined clearly (but still, someone is needed to implement).


Tamas Papp

unread,
Sep 8, 2015, 7:33:41 AM9/8/15
to julia...@googlegroups.com
AFAIK https://github.com/davidavdav/NamedArrays.jl already does this and
is maintained actively.

Best,

Tamas

Michael Krabbe Borregaard

unread,
Sep 8, 2015, 7:59:24 AM9/8/15
to julia...@googlegroups.com
Thanks, it looks like that package will do the trick!

Cedric St-Jean

unread,
Sep 8, 2015, 8:37:38 AM9/8/15
to julia-users
DataFrame behavior has been discussed many times, eg. https://groups.google.com/d/msg/julia-users/8UFnEIfIW0k/QNEustV9BQAJ. Short answer: having row names is considered, but a bit of a philosophical difference, so it's not guaranteed to happen at some point.

Michael Krabbe Borregaard

unread,
Sep 8, 2015, 8:59:20 AM9/8/15
to julia...@googlegroups.com
Interesting to follow that discussion, thanks. I can see the philosophical arguments not to, though I think rownames are intuitive and nice.
Reply all
Reply to author
Forward
0 new messages