I am a new user of H2O and R. I am trying to use the airline dataset to run a GLM model using H2O in R (without regularization). I am encountering numerous errors in my code as shown below in red. Can someone please help. - Thanks for your time.
library(h2o)
localH2O = h2o.init(nthreads = -1) # Initializes one node in the H2O
pathToData = normalizePath("airline.csv")
# Loading the data from the local dir
airline.hex = h2o.uploadFile(path = pathToData, header = TRUE, destination_frame = "airline.hex")
# Splitting the dataset into training and validation objects
# Method 1
s = h2o.runif(airline.hex, seed = 1234)
airline.train = airline.hex[s <= 0.8,]
airline.test = airline.hex[s > 0.8,]
# Method 2
#airline.split = h2o.splitFrame(data = airline.hex, ratios = 0.8)
#airline.train = airline.split[[1]]
#airline.test = airline.split[[2]]
summary(airline.train)
summary(airline.test)
x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier", "Origin", "Dest", "Distance")
y = "IsDepDelayed"
# Training GLM model
airline.model <- h2o.glm(x=x, y=y, training_frame = airline.train, validation_frame = airline.test, model_id = "glm.model", family = "binomial", lambda_search = FALSE)
# Extracting and handling results
#(1) Making predictions
print("Predict on GLM model")
airline.results = h2o.predict(airline.model, airline.test)
#(2) Calculating metrics Performance and AUC on results file
print("Check performance and AUC")
perf = h2o.performance(airline.model, airline.results)
auc = h2o.auc(perf)
#Error in .h2o.doSafeREST(conn = conn, h2oRestApiVersion = h2oRestApiVersion, : Test/Validation dataset has no columns in common with the training set
# If I comment out the perf and aux metric commands above and run, I get the next error listed below
print("Show Distribution of predictions with quantile")
quant = h2o.quantile(airline.results[,2])
print("Extract strongest predictions")
top.airline <- h2o.assign(airline.results[airline.results$YES > quant["75%"]], key = "top.airline")
#Error in expr[[1L]] : subscript out of bounds
top.airline
# Perform classification on validation frame
prediction = h2o.predict(object = airline.model, newdata = airline.test)
# Copy predictions from H2O to R
pred = as.data.frame(prediction)
head(pred)