Determining the unique values in an array and cross-tabulating arrays

497 views
Skip to first unread message

Douglas Bates

unread,
May 31, 2012, 12:06:21 PM5/31/12
to juli...@googlegroups.com
See https://gist.github.com/2844387 for a description

My first question is whether an iterator from a Set{T} should have values of type T instead of type Any.  The second is whether I am reinventing the wheel?  My goal is functionality like R's `table` or `xtabs` functions.

John Myles White

unread,
May 31, 2012, 12:33:56 PM5/31/12
to juli...@googlegroups.com, juli...@googlegroups.com
I think you are not reinventing the wheel: I also found myself implementing this functionality a while back using the same approach. It would be good to have a canonical version of table.

 -- John

Tom Short

unread,
May 31, 2012, 1:05:33 PM5/31/12
to juli...@googlegroups.com
On Thu, May 31, 2012 at 12:06 PM, Douglas Bates <dmb...@gmail.com> wrote:
> See https://gist.github.com/2844387 for a description
>
> The second is whether I am reinventing the
> wheel?  My goal is functionality like R's `table` or `xtabs` functions.

Harlan Harris's work on PooledDataVec's (a factor-like type) has some
commonality, at least for finding unique values:

https://groups.google.com/d/msg/julia-dev/d9aE-njMRU4/N6JMFA_PDlMJ
https://github.com/HarlanH/julia/blob/Data/extras/data.jl

Harlan Harris

unread,
May 31, 2012, 2:12:57 PM5/31/12
to juli...@googlegroups.com
Yeah, looks good. Whatever standard functions are defined for vectors/matrices/sets of vectors can be overloaded for DataVecs/DataFrames of various sorts, some of which may be able to pull unique values and crosstabs in sublinear or constant time...

 -Harlan

Stefan Karpinski

unread,
May 31, 2012, 2:48:29 PM5/31/12
to juli...@googlegroups.com
On Thu, May 31, 2012 at 12:06 PM, Douglas Bates <dmb...@gmail.com> wrote:
My first question is whether an iterator from a Set{T} should have values of type T instead of type Any.

Presumably. Is that not what's happening? It's possible the type inference isn't working here.

The general policy I've been using for typing arguments and return values is Postel's law in a slight reformulation: be conservative in what you return, be liberal in what you accept. In other words, return things with as sharp type information as you can, but accept things with as loose types as you can. For example, if you have a pair of functions, one which produces a Dict and another which accepts a Dict, you probably want to produce a Dict with specific key and value types, rather than just making a Dict{Any,Any} and returning it. The paired function can be more liberal, however, allowing it to be used even if the passed in Dict isn't quite of the expected type.

Douglas Bates

unread,
May 31, 2012, 3:24:49 PM5/31/12
to juli...@googlegroups.com

On Thursday, May 31, 2012 1:48:29 PM UTC-5, Stefan Karpinski wrote:
On Thu, May 31, 2012 at 12:06 PM, Douglas Bates <dmb...@gmail.com> wrote:
My first question is whether an iterator from a Set{T} should have values of type T instead of type Any.

Presumably. Is that not what's happening? It's possible the type inference isn't working here.

It doesn't appear so

julia> ss = Set{Int}()
Set{Int64}()

julia> for i in 1:10 add(ss, i) end

julia> typeof(ss)
Set{Int64}
Methods for generic function Set
Set() at set.jl:4
Set(Any...,) at set.jl:5

julia> typeof([s for s in ss])
Array{Any,1}

Patrick O'Leary

unread,
May 31, 2012, 5:06:14 PM5/31/12
to juli...@googlegroups.com
On Thursday, May 31, 2012 2:24:49 PM UTC-5, Douglas Bates wrote:

On Thursday, May 31, 2012 1:48:29 PM UTC-5, Stefan Karpinski wrote:
On Thu, May 31, 2012 at 12:06 PM, Douglas Bates <dmb...@gmail.com> wrote:
My first question is whether an iterator from a Set{T} should have values of type T instead of type Any.

Presumably. Is that not what's happening? It's possible the type inference isn't working here.

It doesn't appear so

julia> ss = Set{Int}()
Set{Int64}()

julia> for i in 1:10 add(ss, i) end

julia> typeof(ss)
Set{Int64}
Methods for generic function Set
Set() at set.jl:4
Set(Any...,) at set.jl:5

julia> typeof([s for s in ss])
Array{Any,1}

That looks more like the Anyness that comes from using a comprehension. Have you tried it with map() instead?
 

Stefan Karpinski

unread,
May 31, 2012, 5:08:57 PM5/31/12
to juli...@googlegroups.com
Using map will give the expected result because it actually uses the (imo awful) "trick" of using the type of the first element. They should really do the same thing.
Reply all
Reply to author
Forward
0 new messages