Modin alternative for apply funtion to commplete column

21 views
Skip to first unread message

var...@kaleyra.com

unread,
Aug 23, 2019, 6:16:14 AM8/23/19
to modin-dev
I'm using pandas to do the following task

```python
import pandas as pd
import phonenumbers

data = pd.read_excel('5lac.csv') #Read
data['Mobile'] = data['Mobile'].astype(str).apply(lambda x : "+" + x) #Append + in start
data['Valid'] = data['Mobile'].apply(lambda x: phonenumbers.is_valid_number(phonenumbers.parse(x)))
```
I get this warning
UserWarning: `Series.__getstate__` defaulting to pandas implementation.
Is there any alternative for modin

var...@kaleyra.com

unread,
Aug 23, 2019, 6:29:21 AM8/23/19
to modin-dev
df = pd.read_csv('5lac0.csv')
df = df.astype(str)
df.loc[:,'Mobile'] = df.loc[:,'Mobile'].apply(lambda x:'+'+x)
df.loc[:,'Valid'] = df.loc[:,'Mobile'].apply(lambda x: phonenumbers.is_valid_number(phonenumbers.parse(x,'IN')))

Removes the warning but time taken is same as pandas

Devin Petersohn

unread,
Aug 26, 2019, 2:51:13 PM8/26/19
to var...@kaleyra.com, modin-dev
The `__getstate__` warning is a bug and was also reported separately on the GitHub: https://github.com/modin-project/modin/issues/764. It should not be converting to pandas for that.

As for the time taken, operating on columns individually may result in similar or worse performance compared to pandas because they are often much smaller than DataFrames. Typically it takes ~10-20MB of data before Modin becomes a significant outperformer, and this is primarily because of the communication costs and overheads of doing multiprocess computing. There are ways we are working on right now to speed up this computation and bring down the overheads, and it should be better within a month or two. Thanks!



This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete all copies of it from your system and notify the sender immediately by return E-mail. The sender does not accept liability for any errors or omissions. 

--
You received this message because you are subscribed to the Google Groups "modin-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modin-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modin-dev/ef8e1bc5-46d5-48ab-a995-27fb71fe00f7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages