--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-- Erin LeDell Ph.D. Statistician & Machine Learning Scientist | H2O.ai
col1 | col2 | |
---|---|---|
type | int | int |
mins | 1.0 | 2.0 |
mean | 3.0 | 4.0 |
maxs | 5.0 | 6.0 |
sigma | 2.0 | 2.0 |
zeros | 0 | 0 |
missing | 0 | 0 |
0 | 1.0 | 2.0 |
1 | 3.0 | 4.0 |
2 | 5.0 | 6.0 |
col1 | col2 | |
---|---|---|
type | int | int |
mins | 1.0 | 2.0 |
mean | 3.0 | 4.0 |
maxs | 5.0 | 6.0 |
sigma | 2.0 | 2.0 |
zeros | 0 | 0 |
missing | 0 | 0 |
0 | 1.0 | 2.0 |
1 | 3.0 | 4.0 |
2 | 5.0 | 6.0 |
Any chance you can make a reproducible example?
I have verified that this works in regular Python h2o, so if you can provide a reproducible example, I will file a bug report. Also please note what version you are using:
Here is regular h2o Python example:
import h2o
h2o.init()
iris = h2o.import_file(path="https://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
iris.names = ["a","b","c","d","e"]
In [9]: iris.names
Out[9]: [u'a', u'b', u'c', u'd', u'e']
On 5/5/17 1:20 PM, Shi Yu wrote:
--I created a h2o frame from PySpark dataframe (sparse vector):h2o_frame = h2c.as_h2o_frame(all_data)
when I describe it:
h2o_frame.describe()
it shows the automatically named feature names:
feature1, feature2, feature3, ..
I tried to rename them using
h2o_frame.names = myexpected_names
however, it does not work. When I describe, or run the model in H2O flow, the displayed feature names are still "feature1, feature2, feature3, ..."
How could I change those feature names so they are more meaningful in the H2O figures?
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Great,
Looks like you've solved the problem. If I remember correctly, we had this same issue in the h2o Python module a while back (where frames were uploaded directly from disk; not copied from Spark), and the bug was fixed, but maybe the fix doesn't work for frames copied over from Spark for whatever reason.
I filed a bug report here: https://0xdata.atlassian.net/browse/SW-425
Thanks!
-Erin
Yes, good question.
If you google terms like "h2o python docs" or something similar,
you may find older versioned copies of the documentation. What
you're looking for is:
Python module docs: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/index.html
You can find this link from docs.h2o.ai and scroll down to the
Python section. This is the "Python module documentation" ... Not
to be confused with the regular H2O documentation, aka "H2O User
Guide"
For the H2O User Guide: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html Or you can find it by going to docs.h2o.ai and click on "H2O User Guide."
Best,
Erin
Yes, it was confusing because h2o.names and h2o.describe show different results. But I did see this https://0xdata.atlassian.net/browse/PUBDEV-2466 and got hint.
BTW, is there a way to get an updated python API (methods) for H2O. I found many pieces of information here and there, but hard to find a go-to place (most of them are for R not python)
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-- Erin LeDell Ph.D. Statistician & Machine Learning Scientist | H2O.ai
-- Erin LeDell Ph.D. Statistician & Machine Learning Scientist | H2O.ai
The Python H2OFrame follows Pandas conventions (as much as we
can)... though we have aliases: columns, col_names, names
In a Pandas dataframe, the way you're doing it doesn't work either:
import pandas
df = pandas.DataFrame([{'c1':3,'c2':10},{'c1':2, 'c2':30},{'c1':1,'c2':20},{'c1':2,'c2':15},{'c1':2,'c2':100}]) df.columns[0] = "a1" #gives an error df.columns.values[0] = "a1" #works
The Pandas Dataframe has a rename method: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html
df = df.rename(columns = {'c1':'bb'})
And the H2OFrame has a set_name (and set_names) method (i opened
a ticket here
to create a rename method that wraps this). It looks like
set_name works but it's buggy -- it throws an error yet completes
the rename. We will fix this in the next bug fix release:
https://0xdata.atlassian.net/browse/PUBDEV-4969
import h2o
h2o.init() hf = h2o.H2OFrame(df) hf = hf.set_name('c2', 'bb') #this contains a bug: it throws an error, but at the same time also renames the column
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-59-b1fb03098a83> in <module>()
----> 1 hf = hf.set_name(col = 'c2', name = 'bb')
/usr/local/lib/python2.7/site-packages/h2o/frame.pyc in set_name(self, col, name)
1051 self._frame()._ex._cache.fill()
1052 else:
-> 1053 self._ex._cache._names = self.names[:col] + [name] + self.names[col + 1:]
1054 self._ex._cache._types[name] = self._ex._cache._types.pop(oldname)
1055 return
TypeError: slice indices must be integers or None or have an __index__ method
# see that it's updated
hf.names
# [u'c1', u'bb']
Follow-up ... that Type error that you get with set_names was
recently fixed and is working in 3.14.0.3.