(e.g. using Boolean Masking, df2[ df2["B"] == "a", :] )
However, first I need to know all the possible values this field has
(e.g. df2["B"].unique() )
Right now I am doing something that feels very inefficient and sort of kludgey.
df3 = df2['B'].unique()
for i in xrange(df3.nrow):
val = df3[i,:].as_data_frame(use_pandas=False)[1][0]
newframe = df2[df2['B']==val]
*** Do Computation on newframe ***
Additionally, when I do this, it randomly eventually produces a error message such as
>ERROR MESSAGE:
>
>Temp ID py_417 already exists
Is there a better way to do this?
-Carolyn
Perfect. Thank you!
-Carolyn
As a follow up question, lets say I now wanted to create a histogram of the distribution of lengths of this set of dataframes.
I wrote:
all_lengths = map( lambda x: x.dim[0], df_list)
However, this call is sloooooowwwwwww. It is taking me much longer to gather this statistic about the frames than it took to construct them in the first place.
Any ideas as to why that is or how this could be improved?
i.e.m no need to split up the H2OFrame into many H2OFrame objects just to dim[0] the result, it's pointless work when all you really want is a single pass over the H2OFrame.
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
What I really want to do is to extract a frame and then perform an operation that takes into account the order of the rows in the frame.
(in pseudo code)
state_value = initial_value
for i in range(frame.dim[0]):
state_value = defined_function(state_value, frame[i,:])
The data frame being extracted is a time series and we want to be able to implement fairly flexible logic to determine if a certain sequence of events happened over the time series.
For example,
1) "Did the value in column B drop by more than 50% while the value in column A was rising"
2) What events are separated by no more than 1 second?
As a result, I doubt a vanilla accumulation method will get us there.
What would it take to do this in H2O?