Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unable to convert pandas object to string

1,783 views
Skip to first unread message

Bhaskar Dhariyal

unread,
Jun 24, 2017, 5:32:36 AM6/24/17
to
<class 'pandas.core.frame.DataFrame'>
Int64Index: 171594 entries, 0 to 63464
Data columns (total 7 columns):
project_id 171594 non-null object
desc 171594 non-null object
goal 171594 non-null float64
keywords 171594 non-null object
diff_creat_laun 171594 non-null int64
diff_laun_status 171594 non-null int64
diff_status_dead 171594 non-null int64
dtypes: float64(1), int64(3), object(3)

not able to convert desc and keywords to string for preprocessing.
Tried astype(str). Please help

Paul Barry

unread,
Jun 24, 2017, 5:45:35 AM6/24/17
to
Any chance you could post one line of data so we can see what we have to
work with?

Also - have you taken a look at Jake VanderPlas's notebooks? There's lot of
help with pandas to be found there:
https://github.com/jakevdp/PythonDataScienceHandbook

Paul.

On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariya...@gmail.com>
wrote:
> --
> https://mail.python.org/mailman/listinfo/python-list
>



--
Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
http://paulbarry.itcarlow.ie - e: paul....@itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.

Paul Barry

unread,
Jun 24, 2017, 7:27:23 AM6/24/17
to
Hi Bhaskar.

Please see attached PDF of a small Jupyter notebook. As you'll see, the
data in the fields you mentioned are *already* strings. What is it you are
trying to do here?

Paul.

On 24 June 2017 at 10:51, Bhaskar Dhariyal <dhariya...@gmail.com>
wrote:

> ​
> train.csv
> <https://drive.google.com/file/d/0B1D4AyluMGU0enoxbElGTV94Q0E/view?usp=drive_web>
> ​here it is thanks for quick reply
>
> On Sat, Jun 24, 2017 at 3:14 PM, Paul Barry <paul.jam...@gmail.com>
> wrote:
>
>> Any chance you could post one line of data so we can see what we have to
>> work with?
>>
>> Also - have you taken a look at Jake VanderPlas's notebooks? There's lot
>> of help with pandas to be found there: https://github.com/jake
>> vdp/PythonDataScienceHandbook
>>
>> Paul.
>>
>> On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariya...@gmail.com>
>> wrote:
>>

Albert-Jan Roskam

unread,
Jun 24, 2017, 9:30:28 AM6/24/17
to
________________________________
From: Albert-Jan Roskam <sjeik...@hotmail.com>
Sent: Saturday, June 24, 2017 11:26:26 AM
To: Paul Barry
Subject: Re: Unable to convert pandas object to string

(sorry for top posting)

Try using fillna('') to convert np.nan into empty strings. df['desc'] = df.desc.fillna(''). Btw, np.object already is what best approximates str. I wish np.object had its own sentinel value for missing data instead of np.nan, which is a float.
________________________________
From: Python-list <python-list-bounces+sjeik_appie=hotma...@python.org> on behalf of Paul Barry <paul.jam...@gmail.com>
Sent: Saturday, June 24, 2017 9:44:54 AM
To: Bhaskar Dhariyal
Cc: pytho...@python.org
Subject: Re: Unable to convert pandas object to string

Any chance you could post one line of data so we can see what we have to
work with?

Also - have you taken a look at Jake VanderPlas's notebooks? There's lot of
help with pandas to be found there:
https://github.com/jakevdp/PythonDataScienceHandbook

Paul.

On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariya...@gmail.com>
wrote:

> --
> https://mail.python.org/mailman/listinfo/python-list
>



--
Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
http://paulbarry.itcarlow.ie - e: paul....@itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
--
https://mail.python.org/mailman/listinfo/python-list

Paul Barry

unread,
Jun 25, 2017, 11:34:42 AM6/25/17
to
Forgot to include this reply to the list (as others may want to comment).

---------- Forwarded message ----------
From: Paul Barry <paul.jam...@gmail.com>
Date: 24 June 2017 at 12:21
Subject: Re: Unable to convert pandas object to string
To: Bhaskar Dhariyal <dhariya...@gmail.com>


Note that .info(), according to its docs, gives you a "Concise summary of a
DataFrame". Everything is an object in Python, including strings, so the
output from .info() is technically correct (but maybe not very helpful in
your case).

As I've shown, we can work out that the data you want to work with is in
fact a string, so I've added some code to my notebook to show you how to
tokenize the first row of data. This should get you started on doing this
to the rest of your data.

Note, too, that some of the data in these specific columns contains
something other than a string, so you'll need to clean up that first (see
the end of the updated notebook, attached, for how I worked out that this
was indeed the case).

I hope this all helps.

Paul.



On 24 June 2017 at 11:31, Bhaskar Dhariyal <dhariya...@gmail.com>
wrote:

> The data type showing there is object. In[4] in the first page. I wanted
> to tokenize the name & desc column and clean it
>
>
> On Sat, Jun 24, 2017 at 3:54 PM, Paul Barry <paul.jam...@gmail.com>
> wrote:
>
>> Hi Bhaskar.
>>
>> Please see attached PDF of a small Jupyter notebook. As you'll see, the
>> data in the fields you mentioned are *already* strings. What is it you are
>> trying to do here?
>>
>> Paul.
>>
>> On 24 June 2017 at 10:51, Bhaskar Dhariyal <dhariya...@gmail.com>
>> wrote:
>>
>>> ​
>>> train.csv
>>> <https://drive.google.com/file/d/0B1D4AyluMGU0enoxbElGTV94Q0E/view?usp=drive_web>
>>> ​here it is thanks for quick reply
>>>
>>> On Sat, Jun 24, 2017 at 3:14 PM, Paul Barry <paul.jam...@gmail.com>
>>> wrote:
>>>
>>>> Any chance you could post one line of data so we can see what we have
>>>> to work with?
>>>>
>>>> Also - have you taken a look at Jake VanderPlas's notebooks? There's
>>>> lot of help with pandas to be found there: https://github.com/jake
>>>> vdp/PythonDataScienceHandbook
>>>>
>>>> Paul.
>>>>
>>>> On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariya...@gmail.com>
>>>> wrote:
>>>>
0 new messages