Returning a series with length and index

41 views
Skip to first unread message

Max Russell

unread,
Jun 9, 2017, 9:54:06 AM6/9/17
to PyData
I am working with the scikit learn Wisconsin breast cancer dataset.

I've created a DataFrame with last columns like so:
      worst concave points  worst symmetry  worst fractal dimension  target  
 
0                 0.26540          0.4601                  0.11890     0.0  
 
1                 0.18600          0.2750                  0.08902     0.0  
 
2                 0.24300          0.3613                  0.08758     0.0  



I'm trying to create a pd.Series based on target where it can be 0 or 1 and assign that to an index of 'malignant', 'benign'

I've tried the following:
output = df.Series([0,1], index= ['Malginant', 'Benign'])

with output being:
Enter code here...no    0
yes       1
dtype: int64

and also tried a mapping:
    status = {0:'Malignant', 1:'Benign'}
    cancerdf['target'] = pd.to_numeric(cancerdf['target'], errors='coerce').fillna(2).astype(int).map(status)

where cancerdf is my whole dataframe with the target column

however, this throws a type error:
TypeError: tuple indices must be integers or slices, not str


I'm trying to understand how to get this kind of mapping from the column, in order to move to further processing.

thanks very much.

Joris Van den Bossche

unread,
Jun 9, 2017, 11:13:13 AM6/9/17
to PyData
It's not fully clear what the end result is what you want to obtain. Can you provide a reproducible example with the desired output?

Joris

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Leimbigler

unread,
Jun 10, 2017, 1:45:54 PM6/10/17
to PyData
If you intend to replace 0 with 'Malignant' and 1 with 'Benign':

df['target'].replace({0: 'Malignant', 1: 'Benign'}, inplace=True)

Complete example:

import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
data
= load_breast_cancer()
df
= pd.DataFrame(data=np.c_[data['data'], data['target']],
                  columns
=list(data['feature_names']) + ['target'])
df
['target'].replace({0: 'Malignant', 1: 'Benign'}, inplace=True)
df
.head()

Best,
Peter
Reply all
Reply to author
Forward
0 new messages