Returning a series with length and index

Max Russell

unread,

Jun 9, 2017, 9:54:06 AM6/9/17

to PyData

I am working with the scikit learn Wisconsin breast cancer dataset.

I've created a DataFrame with last columns like so:

      worst concave points  worst symmetry  worst fractal dimension  target  
 0                 0.26540          0.4601                  0.11890     0.0  
 1                 0.18600          0.2750                  0.08902     0.0  
 2                 0.24300          0.3613                  0.08758     0.0

I'm trying to create a pd.Series based on target where it can be 0 or 1 and assign that to an index of 'malignant', 'benign'

I've tried the following:

output = df.Series([0,1], index= ['Malginant', 'Benign'])

with output being:

Enter code here...no 0

yes       1
dtype: int64

and also tried a mapping:

    status = {0:'Malignant', 1:'Benign'}
    cancerdf['target'] = pd.to_numeric(cancerdf['target'], errors='coerce').fillna(2).astype(int).map(status)

where cancerdf is my whole dataframe with the target column

however, this throws a type error:

TypeError: tuple indices must be integers or slices, not str

I'm trying to understand how to get this kind of mapping from the column, in order to move to further processing.

thanks very much.

Joris Van den Bossche

unread,

Jun 9, 2017, 11:13:13 AM6/9/17

to PyData

It's not fully clear what the end result is what you want to obtain. Can you provide a reproducible example with the desired output?

Joris

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Leimbigler

unread,

Jun 10, 2017, 1:45:54 PM6/10/17

to PyData

If you intend to replace 0 with 'Malignant' and 1 with 'Benign':

df['target'].replace({0: 'Malignant', 1: 'Benign'}, inplace=True)

Complete example:

import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data=np.c_[data['data'], data['target']],
                  columns=list(data['feature_names']) + ['target'])
df['target'].replace({0: 'Malignant', 1: 'Benign'}, inplace=True)
df.head()