Hi Hans Martin,
On 14 October 2012 21:39, Hans-Martin von Gaudecker
<
hmgau...@gmail.com> wrote:
> In general, I often struggle with the distinction between columns contained
> in a MultiIndex and "normal" columns. In SQL and this particular example, I
> would set a primary key and an "order by" on columns A, B, C, but otherwise
> there is no difference between these columns and any other columns. I can
> see why there is a stronger distinction being made in Pandas, but it seems
> like I can't get my head around how to harness it best. Am I using it in a
> fundamentally wrong way? Should I just keep duplicate "normal" columns? Or
> is there another recommended way?
I totally agree.
It is also a problem for joins. Here is my experience:
Even if your index is named, it has to be singled out with e.g. left_index=True.
In most domains it is possible to keep unique global naming (because
of a reduced number of tables and/or columns) and then the columns
would be matched automatically by name ... if they were not at the
same time used as indexes.
I hadn't written to the list so far because I am not sure of what is
the best solution . Ideally one should be able, as you say, to treat
indexes as regular columns!
My current solution is to keep two copies, one as an index, one as a column
df.set_index(index_that_i_want, inplace=True, drop=False)
The drop option keeps the index amongst the columns. I can do this
because a) I have enough space (but who knows in the future...) and b)
because my columns are only ever 'bulk updated', there is no risk of
column and index getting out of sync (my scenario is more OLAP than
OLTP).
What are thoughts on this issue? Is this something really unavoidable?
Duality of column types (normal vs. index) seems to me the greatest
inconvenience so far in the use of Pandas, although it has been well
solved by drop=False so far.
Thanks for bringing this up!
-á.