Le jeudi 26 mai 2016 à 15:16 -0400, Jeff Bezanson a écrit :
> Is this de-coupled from the notion of categorical data? I want
> something that just does the pooling optimization automatically for
> all types T, without separately defining the pool or adding any new
> ordering behavior. It would probably also be good to store the pool
> sorted for fast lookup, but that's a bonus.
It depends on what you mean by "categorical data". CategoricalArray
stores a CategoricalPool, but that's mostly invisible to the user. When
indexed, it returns CategoricalValue objects which are immutable
wrappers storing the value (i.e. the string) and a reference to the
pool. In practice it should be usable as a string in many cases.
Then there's OrdinalArray, which adds an ordering to the values, by
default based on the order of appearance of the levels or on their
insertion order.
Does CategoricalArray suit your needs? The main difference with PDAs is
that it doesn't attempt to act like a standard array by supporting any
operation that the underlying type supports.
Regards