Performance issues with .ix/.loc with pandas v0.19.1/numpy v1.12.0

109 views

Skip to first unread message

hansiz...@gmail.com

unread,

Dec 16, 2016, 9:35:42 PM12/16/16

to PyData

Dear all,

I am using Pandas to handle huge amount of data with Python. Most of the times it works like a charm. Thank you for that.

Recently, however, I discovered a performance issue, which happend after upgrading my Python packages (amongst the other upgrades pandas v0.18.1 --> v0.19.1 and numpy v1.11.1 --> v1.12.0).
My application was so significantly slower after this upgrade and I could locate the performance decrease to .ix and .loc calls. Some of those .ix and .loc calls took about a second.
More particularly, I use .ix and/or .loc to fetch rows from a dataframe df, which has md5 hash values in the index row:

row = df.ix[hash]

Although, I believe that the size of the dataframe is not causative for the issue, I want to mention that the dataframe is a couple of GB large with tens of millions of rows.

Downgrading both, pandas and numpy to their previous version fixed the problem for me (Note: I only downgraded those two packages).

Another issue I noticed, that changing the dtype of (any of the) columns causes .loc and .ix also to be slow. When I changed the dtype from object to category or bool, for instance.

Are those problems know issues? I could not find anything related to the current versions of the packages. I solved the problem for me by downgrading the packages.
Nevertheless, I thought this might be something you should know about.

Again I want to say Thank You for the wonderful work you've done here! It eased my work a lot!

Best Regards!

John E

unread,

Dec 18, 2016, 7:14:50 PM12/18/16

to PyData

loc and iloc are generally recommended over ix nowadays. Maybe try one of those?

Also, you may be better off posting a question like this at stackoverflow (with sample data, if at all possible)