Get unique counts from Pandas MultiIndex

374 views
Skip to first unread message

Kent Maxwell

unread,
Feb 24, 2018, 8:30:00 AM2/24/18
to PyData
Hi!

Consider this sample code:

index_list = [(1170308443, 230, 2),
             
(1170310283, 10, 1),
             
(1170307300, 30, 2),
             
(1170308400, 2010, 2),
             
(1170310283, 100, 1),
             
(1170305943, 20, 2),
             
(1170307300, 30, 1),
             
(1170308419, 380, 2),
             
(1170310284, 10, 1),
             
(1170310284, 20, 1),
             
(1170310309, 40, 1),
             
(1170310310, 10, 1),
             
(1170310312, 10, 2),
             
(1170310313, 60, 1),
             
(1170310313, 70, 1),
             
(1170310316, 40, 2),
             
(1170309498, 30, 2),
             
(1170308419, 390, 2),
             
(1170309304, 30, 2),
             
(1170309304, 100, 1)]


index
= pd.MultiIndex.from_tuples(index_list, names=['doc_num', 'doc_line', 'sub_line'])


This outputs all the unique values at the specific level, but how can I get a count from this unique output?
index.get_level_values('doc_num').unique

This code doesn't work, but how could I get unique values, or even better a count of unique values with two levels from the multiindex?
index.get_level_values('doc_num', 'doc_line').unique

Thanks!

Kent

Pietro Battiston

unread,
Feb 24, 2018, 11:38:24 AM2/24/18
to pyd...@googlegroups.com
Il giorno sab, 24/02/2018 alle 05.30 -0800, Kent Maxwell ha scritto:
> [...]
> index_list = [(1170308443, 230, 2),
> [...]
>               (1170309304, 100, 1)]
>
>
> index = pd.MultiIndex.from_tuples(index_list, names=['doc_num',
> 'doc_line', 'sub_line'])
>
>
> This outputs all the unique values at the specific level, but how can
> I get a count from this unique output?
> index.get_level_values('doc_num').unique

index.get_level_values('doc_num').value_counts()

?

>
> This code doesn't work, but how could I get unique values, or even
> better a count of unique values with two levels from the multiindex?
> index.get_level_values('doc_num', 'doc_line').unique

Do you mean "count of unique pairs of values"?

Can't think to anything better than

index.to_frame().groupby(['doc_num', 'sub_line']).size()
Pietro

Kent Maxwell

unread,
Feb 24, 2018, 12:58:25 PM2/24/18
to pyd...@googlegroups.com
Hi!  Thanks!  this helps.

I needed to adjust your code a slight to get the exact results I am looking for, but you got me in the right direction:

# get count of doc_num
index.get_level_values('doc_num').unique().shape[0]

# get count of doc_num, doc_line
index.to_frame().groupby(['doc_num', 'doc_line']).size().count()


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pietro Battiston

unread,
Feb 24, 2018, 1:56:39 PM2/24/18
to pyd...@googlegroups.com
Il giorno sab, 24/02/2018 alle 11.58 -0600, Kent Maxwell ha scritto:
> Hi!  Thanks!  this helps.
>
> I needed to adjust your code a slight to get the exact results I am
> looking for, but you got me in the right direction:

OK, I had misunderstood you...

>
> # get count of doc_num
> index.get_level_values('doc_num').unique().shape[0]

Equivalent, but maybe more standard:

len(index.get_level_values('doc_num').unique())

Notice that starting from next pandas release,

len(index.unique(level='doc_num'))

is recommended

> # get count of doc_num, doc_line
> index.to_frame().groupby(['doc_num', 'doc_line']).size().count()

Even better:

len(index.to_frame().groupby(['doc_num', 'doc_line']))


Pietro
Reply all
Reply to author
Forward
0 new messages