Changing string vector to enum type in Python

862 views
Skip to first unread message

phil....@gmail.com

unread,
Jun 8, 2015, 9:33:16 PM6/8/15
to h2os...@googlegroups.com
I am importing a file, and one variable of many is being imported as type "str". I'd like to convert that to "enum". Is there a way in the Python API to do that? I think in R you do something like:

hex[,4] = as.factor(hex[,4])

Like to do the same in Python if possible.

Most of the character fields import as enum, just a few do not.

Parag Sanghavi

unread,
Jun 8, 2015, 9:59:01 PM6/8/15
to Philip Pennie, h2ostream
Hi Phil,




# A little feature engineering
# Add in month-of-year (seasonality; fewer bike rides in winter than summer)
secs = bpd["Days"]*secsPerDay
bpd["Month"]     = secs.month().asfactor()


Let me know if this helps

Parag




--
You received this message because you are subscribed to the Google Groups "H2O & Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Parag Sanghavi
Head of Customer Success
H2O.ai
(650) 303-4069

phil....@gmail.com

unread,
Jun 9, 2015, 11:20:31 AM6/9/15
to h2os...@googlegroups.com, phil....@gmail.com
I get this error:

nvironmentError: h2o-py got an unexpected HTTP status code:
412 Precondition Failed (method = POST; url = http://localhost:54321/3/Rapids).
detailed error messages: Enum conversion only works on integer columns

The field being converted is a string.

Parag Sanghavi

unread,
Jun 9, 2015, 3:34:07 PM6/9/15
to Philip Pennie, h2ostream
Hi Phil,

You will have to parse the dataset again. See example code below

fraw = h2o.import_file("smalldata/logreg/prostate.csv") 
fsetup = h2o.parse_setup(fraw) 
fsetup["column_types"][1] = "Enum" # change second column "CAPSULE" to categorical 
fr = h2o.parse_raw(fsetup) 
fr.describe()

How many levels do you have in this column. We have a limit of 10,000,000

Parag

--
You received this message because you are subscribed to the Google Groups "H2O & Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

phil....@gmail.com

unread,
Jun 9, 2015, 4:20:23 PM6/9/15
to h2os...@googlegroups.com, phil....@gmail.com
OK, I will check the parsing example.

I have only a few hundred levels. I think it might be pulling this in as a string because of some underscores, not sure. I get test that out if that would he useful.

Thanks
Reply all
Reply to author
Forward
0 new messages