H2O and R: Need help with running Airline dataset using GLM model

371 views
Skip to first unread message

Anu Bulusu

unread,
Nov 2, 2015, 1:57:53 PM11/2/15
to H2O Open Source Scalable Machine Learning - h2ostream
Hello,

I am a new user of H2O and R. I am trying to use the airline dataset to run a GLM model using H2O in R (without regularization). I am encountering numerous errors in my code as shown below in red. Can someone please help. - Thanks for your time.


library(h2o)
localH2O = h2o.init(nthreads = -1)  # Initializes one node in the H2O

pathToData = normalizePath("airline.csv")
# Loading the data from the local dir
airline.hex = h2o.uploadFile(path = pathToData, header = TRUE, destination_frame = "airline.hex")

# Splitting the dataset into training and validation objects

# Method 1

s = h2o.runif(airline.hex, seed = 1234)
airline.train = airline.hex[s <= 0.8,]
airline.test = airline.hex[s > 0.8,]

# Method 2
#airline.split = h2o.splitFrame(data = airline.hex, ratios = 0.8)
#airline.train = airline.split[[1]]
#airline.test = airline.split[[2]]

summary(airline.train)
summary(airline.test)

x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier", "Origin", "Dest", "Distance")
y = "IsDepDelayed"

# Training GLM model
airline.model <- h2o.glm(x=x, y=y, training_frame = airline.train, validation_frame = airline.test, model_id = "glm.model", family = "binomial", lambda_search = FALSE)

# Extracting and handling results

#(1) Making predictions
print("Predict on GLM model")
airline.results = h2o.predict(airline.model, airline.test)

#(2) Calculating metrics Performance and AUC on results file
print("Check performance and AUC")
perf = h2o.performance(airline.model, airline.results)
auc  = h2o.auc(perf)

#Error in .h2o.doSafeREST(conn = conn, h2oRestApiVersion = h2oRestApiVersion,  : Test/Validation dataset has no columns in common with the training set
# If I comment out the perf and aux metric commands above and run, I get the next error listed below

print("Show Distribution of predictions with quantile")
quant = h2o.quantile(airline.results[,2])
print("Extract strongest predictions")
top.airline <- h2o.assign(airline.results[airline.results$YES > quant["75%"]], key = "top.airline")
#Error in expr[[1L]] : subscript out of bounds
top.airline

# Perform classification on validation frame
prediction = h2o.predict(object = airline.model, newdata = airline.test)

# Copy predictions from H2O to R

pred = as.data.frame(prediction)
head(pred)

Amy Wang

unread,
Nov 2, 2015, 5:09:27 PM11/2/15
to H2O Open Source Scalable Machine Learning - h2ostream
Hey Anu,

So the documentation ang vignette for H2O-3 is actually available here: http://h2o.ai/resources/

The vignette you found at leanpub is deprecated and for version h2o-classic.

I've also attached a script fixes up all the arguments and inputs.

Sincerely,
Amy Wang
airline (2).r
Reply all
Reply to author
Forward
0 new messages